<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ABSTRACT BOOK</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>CH2M HILL Alumni Conference Center Oregon State University</institution>
          ,
          <addr-line>Corvallis, OR</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>1</fpage>
      <lpage>4</lpage>
      <kwd-group>
        <kwd>Food</kwd>
        <kwd>Nutrition</kwd>
        <kwd>Health and Environment for the 9 billion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This work is licensed under a
Creative Commons Attribution-NoDerivatives 4.0 International License.
Thank you to our conference
sponsors
Committees</p>
    </sec>
    <sec id="sec-2">
      <title>ICBO BioCreative 2016 Organizing Committee</title>
      <sec id="sec-2-1">
        <title>Conference Chair: Pankaj Jaiswal, Oregon State University (OSU), USA</title>
      </sec>
      <sec id="sec-2-2">
        <title>Program Chair: Robert Hoehndorf, KAUST, Saudi Arabia</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Sponsoring and Publicity:</title>
      <sec id="sec-3-1">
        <title>Sivaram Arabandi, ONTOPRO, USA</title>
      </sec>
      <sec id="sec-3-2">
        <title>Dave Clements, Johns Hopkins, Baltimore, USA</title>
      </sec>
      <sec id="sec-3-3">
        <title>Posters and Demos: Paul Schofield, University of Cambridge, UK</title>
      </sec>
      <sec id="sec-3-4">
        <title>Conference Logistics: Oregon State University Conference Services (Donna</title>
      </sec>
      <sec id="sec-3-5">
        <title>Williams, Jill Soth, Carly Weber, Jennifer Stotts)</title>
        <p>ICBO Program committee
BioCreative Organizing Committee
• Anika Oellrich, KCL, UK
• Austin Meier, Planteome, OSU, USA
• Barry Smith, NCBO and University at Buffalo, USA
• Cecilia Arighi, BioCreative &amp; University of</p>
        <p>Delaware, USA
• Chris Mungall, LBNL, USA
• Elizabeth Arnaud, Bioversity, France
• Eugene Zhang, OSU, USA
• Filipe Santana da Silva, Universidade Federal de</p>
        <p>Pernambuco, Brazil
• Georgios Gkoutos, University of Birmingham, UK
• Helen Parkinson, EBI, UK
• Laurel Cooper, Planteome, OSU, USA
• Marie Angélique Laporte, Bioversity, France
• Mark Jensen, University at Buffalo, USA
• Mark Schildhauer, NCEAS, USA
• Mark Wilkinson, UPM, Spain
• Mary Dolan, Jackson Laboratories, USA
• Matthew Brush, Monarch Initiative, OHSU, USA
• Nicole Vasilevsky, Monarch Initiative, OHSU, USA
• Paul Schofield, University of Cambridge, UK
• Pier Luigi Buttigieg, ENVO, Max-Planck-Institute,</p>
        <p>Germany
• Prashanti Manda, UNC Chapel Hill, USA
• Stefan Schulz, Medizinische Universität Graz,</p>
        <p>Austria
• William (Bill) Hogan, University of Florida, USA
• Yongqun (Oliver) He, University of Michigan, USA
• Cecilia Arighi, University of Delaware, USA
• Alfonso Valencia, Spanish National Cancer Centre,</p>
        <p>CNIO, Spain
• Cathy Wu, University of Delaware and Georgetown</p>
        <p>University, USA
• Donal Comeau, National Center for Biotechnology</p>
        <p>Information (NCBI), NIH, USA
• Fabio Rinaldi, Institute of Computational Linguistics,</p>
        <p>University of Zurich, Switzerland
• Kevin Cohen, University of Colorado, USA
• Lynette Hirschman, MITRE Corporation, USA
• Martin Krallinger, Spanish National Cancer Centre,</p>
        <p>CNIO, Spain
• Rezarta Islamaj Dogan, National Center for</p>
        <p>Biotechnology Information (NCBI), NIH, USA
• Sun Kim, National Center for Biotechnology</p>
        <p>Information (NCBI), NIH, USA
• Zhiyong Lu, National Center for Biotechnology</p>
        <p>Information (NCBI), NIH, USA
BP01:
Ignet: A centrality and INO-based web system for analyzing and visualizing literature-mined
networks
Arzucan Ozgur, Junguk Hur, Zuoshuang Xiang, Edison Ong, Dragomir Radev and Yongqun
He
Abstract: Ignet (Integrative Gene Network) is a web-based system for dynamically
updating and analyzing gene interaction networks mined using all Pub- Med abstracts.
Four centrality metrics, namely degree, eigenvector, betweenness, and closeness are
used to determine the importance of genes in the networks. Different gene interaction
types between genes are classified using the Interaction Network Ontology (INO) that
classifies interaction types in an ontological hierarchy along with individual keywords
listed for each interaction type. An interactive user interface is designed to explore the
interaction network as well as the centrality and ontology based net- work analysis.
Availability: http://ignet.hegroup.org.</p>
        <p>BP02:
Disease Named Entity Recognition Using NCBI Corpus
Thomas Hahn, Hidayat Ur Rahman and Richard Segall
Abstract: Named Entity Recognition (NER) in biomedical literature is a very active
research area. NER is a crucial component of biomedical text mining because it allows
for information retrieval, reasoning and knowledge discovery. Much research has been
carried out in this area using semantic type categories, such as fiDNAfl, fiRNAfl,
fiproteinsfl and figenesfl. However, disease NER has not received its needed attention
yet, specifically human disease NER. Traditional machine learning approaches lack the
precision for disease NER, due to their dependence on token level features, sentence
level features and the integration of features, such as orthographic, contextual and
linguistic features. In this paper a method for disease NER is proposed which utilizes
sentence and token level features based on Conditional Random Fields using the NCBI
disease corpus. Our system utilizes rich features including orthographic, contextual,
affixes, bigrams, part of speech and stem based features. Using these feature sets our
approach has achieved a maximum F-score of 94% for the training set by applying 10
fold cross validation for semantic labeling of the NCBI disease corpus. For testing and
development corpus the model has achieved an F-score of 88% and 85% respectively.
BP03:
Label Embedding Approach for Transfer Learning
Rasha Obeidat, Xiaoli Fern and Prasad Tadepalli
Abstract: Automatically tagging textual mentions with the concepts, types and entities
that they represent are important tasks for which supervised learning has been found to
be very effective. In this paper, we consider the problem of exploiting multiple sources
of training data with variant ontologies. We present a new transfer learning approach
based on embedding multiple label sets in a shared space, and using it to augment the
training data.</p>
        <p>BIT101-D204:
Large-scale Semantic Indexing with Biomedical Ontologies
Chih-Hsuan Wei, Robert Leaman and Zhiyong Lu
Abstract: We introduce PubTator, a web-based application that enables large-scale
semantic indexing and automatic concept recognition in biomedical ontologies. Not only
was PubTator formally evaluated and top-rated in BioCreative, it also has been widely
adopted and used by the scientific community from around the world, supporting both
research projects and real-world applications in biocuration, crowdsourcing and
translational bioinformatics.</p>
        <p>BIT102:
One tagger, many uses: Illustrating the power of ontologies in dictionary-based named
entity recognition
Lars Juhl Jensen
Abstract: Automatic annotation of text is an important complement to manual
annotation, because the latter is highly labour intensive. We have developed a fast
dictionary-based named entity recognition (NER) system and addressed a wide variety
of biomedical problems by applied it to text from many different sources. We have used
this tagger both in real-time tools to support curation efforts and in pipelines for
populating databases through bulk processing of entire Medline, the open-access subset
of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and
the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves
80Ð90% precision and 70Ð80% recall. Many of the underlying dictionaries were built
from open biomedical ontologies, which further facilitate integration of the text-mining
results with evidence from other sources.
BIT103:
Scalable Text Mining Assisted Curation of PTM Proteoforms in the Protein Ontology
Karen Ross, Darren Natale, Cecilia Arighi, Sheng-Chih Chen, Hongzhan Huang, Gang Li,
Jia Ren, Michael Wang, K Vijay-Shanker and Cathy Wu
Abstract: The Protein Ontology (PRO) defines protein classes and their interrelationships
from the family to the protein form (proteoform) level within and across species. One of
the unique contributions of PRO is its representation of post-translationally modified
(PTM) proteoforms. However, progress in adding PTM proteoform classes to PRO has
been relatively slow due to the extensive manual curation effort required. Here we
report an automated pipeline for creation of PTM proteoform classes that leverages two
phosphorylation-focused text mining tools (RLIMS-P, which detects mentions of kinases,
substrates, and phosphorylation sites, and eFIP, which detects
phosphorylationdependent protein-protein interactions (PPIs)) and our integrated PTM database,
iPTMnet. By applying this pipeline, we obtained a set of ~820 substrate-site pairs that
are suitable for automated PRO term generation with literature-based evidence
attribution. Inclusion of these terms in PRO will increase PRO coverage of
speciesspecific PTM proteoforms by 50%. Many of these new proteoforms also have associated
kinase and/or PPI information. Finally, we show a phosphorylation network for the
human and mouse peptidyl-prolyl cis-trans isomerase (PIN1/Pin1) derived from our
dataset that demonstrates the biological complexity of the information we have
extracted. Our approach addresses scalability in PRO curation and will be further
expanded to advance PRO representation of phosphorylated proteoforms.
BIT104:
Cardiovascular Health and Physical Activity: A Model for Health Promotion and Decision
Support Ontologies
Vimala Ponna, Aaron Baer and Matthew Lange
Abstract: Current cardiovascular disease decision support systems (DSS) rely primarily
on ontologies that characterize and quantify disease, recommending appropriate
pharmacotherapy (PT) and/or surgical interventions (SI). PubMed and Google Scholar
searches reveal no specific ontologies or literature related to DSS for recommending
physical activity (PA) and diet interventions (DI) for cardiovascular health and fitness
(CVHF) improvement. This dearth of CVHF-PA/DI structured knowledge repositories has
resulted in a scarcity of user-friendly tools for scientifically validated information
retrieval about CVHF improvement. Advancement of health science depends on timely
development and implementation of health (rather than disease) ontologies. We
developed a time-efficient workflow for constructing/maintaining structured knowledge
repositories capable of providing informational underpinnings for CVHF- PA/DI
ontologies and DSS that support health promotion, including precise, personalized
exercise prescription. This workflow creates conceptual lattices about effects of varied
PA on CVHF. These conceptual maps lay the foundation for accelerated creation of
health-focused ontologies, which ultimately equip DSS with CVHF knowledge related PA
and DI.</p>
        <p>BIT105-D106-BP04:
A Web Application for Extracting Key Domain Information for Scientific Publications using
Ontology
Weijia Xu, Amit Gupta, Pankaj Jaiswal, Crispin Taylor and Patti Lockhart
Abstract: We present demos of an ongoing project, domain informational vocabulary
extraction (DIVE), which aims to enrich digital publications through entity and key
informational words detection and by adding additional annotations. The system
implements multiple strategies for biological entity detection, including using regular
expression rules, ontologies, and a keyword dictionary. These extracted entities are then
stored in a database and made accessible through an interactive web application for
curation and evaluation by authors. Through the web interface, the user can make
additional annotations and corrections to the current results. The updates can then be
used to improve the entity detection in subsequent processed articles. Although the
system is being developed in the context of annotating journal articles, it can be also be
beneficial to domain curators and researchers at large.</p>
        <p>BIT106:
Use of text mining for Experimental Factor Ontology coverage expansion in the scope of
target validation
Senay Kafkas, Ian Dunham, Helen Parkinson and Johanna McEntyre
Abstract: Understanding the molecular biology and development of disease plays a key
role in drug development. Integrating evidence from different experimental approaches
with data available from public resources (such as gene expression level changes and
reaction pathways affected by pathogenic mutations) can be a powerful approach for
evaluating different aspects of target-disease associations. The application of ontologies
is of fundamental importance to effective integration. The Target Validation Platform is
a user-friendly interface that integrates such evidences from various resources with the
aim of assisting scientists to identify and prioritise drug targets. Currently, the EFO is
used as the reference ontology for diseases in the platform, importing terms from
existing disease ontologies such as the Human Phenotype Ontology as required. In order
to generalize the use of EFO from key target-diseases for wider use, we need to
compare the target associated disease coverage in EFO with the scope of other available
disease terminology resources. In this study, we address this issue by using text mining
and present our initial results.
D101:
Plant Image Segmentation and Annotation with Ontologies in BisQue
Justin Preece, Justin Elser, Pankaj Jaiswal, Kris Kvilekval, Dmitry Fedorov, B.S.
Manjunath, Ryan Kitchen, Xu Xu, Dmitrios Trigkakis, Sinisa Todorovic and Seth Carbon
Abstract: The field of computer vision has experienced much progress in the last two
decades. Image analysis of photography and video has moved out of computer science
research labs and into a wide range of applications. One example of progress in image
analysis concerns the segmentation of images on the basis of gray scale, color hue,
texture, geometry, and other features. Such image segmentation allows for increasingly
refined classification of images and their components. In a parallel development,
semantic computing has pursued the creation of ontologies in hopes of capturing and
defining what it is we “know” about the world, and presenting it in the form of a
terminology network connected by defined relationships. This knowledge network is
computable, and makes it possible to make logical inferences about facts and data
annotated with ontology terms.</p>
        <p>D102:
SPARQL2OWL: towards bridging the semantic gap between RDF and OWL
Mona Alsharani, Hussein Almashouq and Robert Hoehndorf
Abstract: Several large databases in biology are now making their information available
through the Resource Description Framework (RDF). RDF can be used for large datasets
and provides a graph-based semantics. The Web Ontology Language (OWL), another
Semantic Web standard, provides a more formal, model- theoretic semantics. While
some approaches combine RDF and OWL, for example for querying, knowledge in RDF
and OWL is often expressed differently. Here, we propose a method to generate OWL
ontologies from SPARQL queries using n-ary relational patterns. Combined with
background knowledge from ontologies, the generated OWL ontologies can be used for
expressive queries and quality control of RDF data. We implement our method in a
prototype tool available at https://github.com/ bio- ontology-
researchgroup/SPARQL2OWL.</p>
        <p>D103-W12-05-IP36:
The Phenoscape Knowledgebase: tools and APIs for computing across phenotypes from
evolutionary diversity and model organisms
James Balhoff
Abstract: The Phenoscape Knowledgebase (KB) is an ontologydriven database that
combines existing phenotype annotations from model organism databases with new
phenotype annotations from the evolutionary literature. Phenoscape curators have
created phenotype annotations for more than 5,000 species and higher taxa, by defining
computable phenotype concepts for more than 20,000 character states from over 160
published phylogenetic studies. These phenotype concepts are in the form of Entity–
Quality (EQ) [1] compositions which incorporate terms from the Uberon anatomy
ontology, the Biospatial Ontology (BSPO), and the Phenotype and Trait Ontology (PATO).
Taxonomic concepts are drawn from the Vertebrate Taxonomy Ontology (VTO). This
knowledge of comparative biodiversity is linked to potentially relevant developmental
genetic mechanisms by importing associations of genes to phenotypic effects and gene
expression locations from zebrafish (ZFIN [2]), mouse (MGI [3]), Xenopus (Xenbase [4]),
and human (Human Phenotype Ontology project [5]). Thus far, the Phenoscape KB has
been used to identify candidate genes for evolutionary phenotypes [6], to match
profiles of ancestral evolutionary variation with gene phenotype profiles [7], and to
combine data across many evolutionary studies by inferring indirectly asserted values
within synthetic supermatrices [8]. Here we describe the software architecture of the
Phenoscape KB, including data ingestion, integration of OWL reasoning, web service
interface, and application features (Fig. 1).</p>
        <p>D104:
Updates to the AberOWL ontology repository
Miguel Ángel Rodríguez-García, Luke Slater, Imane Boudellioua, Paul Schofield, Georgios
Gkoutos and Robert Hoehndorf
Abstract: A large number of ontologies have been developed in the biological and
biomedical domains, which are mostly expressed in the Web Ontology Language (OWL).
These ontologies form a logical foundation for our knowledge in these domains, and
they are in widespread use to annotate biomedical and biological datasets. The use of
the semantics provided by ontologies requires the use of automated reasoning –
inferring new knowledge by evaluating the asserted axioms. AberOWL is an ontology
repository which utilises an OWL 2 EL reasoner to provide semantic access to classified
ontologies. Since our original presentation of the AberOWL framework, we have
developed several additional tools and features which enrich its ability to integrate and
explore data, make use of the semantic and inferred content of ontologies. Here we
present an overview of AberOWL and the enhancements and new features which have
been developed since its conception. AberOWL is freely available at http://aber-owl.net.
D106-BP04:
Enhancing Information Accessibility of Publications with Text Mining and Ontology
Weijia Xu, Amit Gupta, Pankaj Jaiswal, Crispin Taylor and Patti Lockhart
Abstract: We present an ongoing effort on utilizing text mining methods and existing
biological ontologies to help readers to access the information contained in the scientific
articles. Our approach includes using multiple strategies for biological entity detection
and using association analysis on extracted analysis. The entity extraction processes
utilizes regular expression rules, ontologies, and keyword dictionary to get a
comprehensive list of biological entities. In addition to extract list of entities, we also
apply natural language processing and association analysis techniques to generate
inferences among entities and comparing to known relations documented in the
existing ontologies.</p>
        <p>D201:
Ontobull and BFOConvert: Web-based programs to support automatic ontology conversion
Edison Ong, Zuoshuang Xiang, Jie Zheng, Barry Smith and Yongqun He
Abstract: When a widely reused ontology appears in a new version which is not
compatible with older versions, the ontologies reusing it need to be updated
accordingly. Ontobull (http://ontobull.hegroup.org) has been developed to
automatically update ontologies with new term IRI(s) and associated metadata to take
account of such version changes. To use the Ontobull web interface a user is required to
(i) upload one or more ontology OWL source files; (ii) input an ontology term IRI
mapping; and (where needed) (iii) provide update settings for ontology headers and
XML namespace IDs. Using this information, the backend Ontobull Java program
automatically updates the OWL ontology files with desired term IRIs and ontology
metadata. The Ontobull subprogram BFOConvert supports the conversion of an
ontology that imports a previous version of BFO. A use case is pro- vided to demonstrate
the features of Ontobull and BFOConvert.</p>
        <p>D202:
Reusing the NCBO BioPortal technology for agronomy to build AgroPortal
Clement Jonquet, Anne Toulet, Elizabeth Arnaud, Sophie Aubin, Esther Dzale Yeumo,
Vincent Emonet, John Graybeal, Mark A. Musen, Cyril Pommier and Pierre Larmande
Abstract: Many vocabularies and ontologies are produced to represent and annotate
agronomic data. By reusing the NCBO BioPortal technology, we have already designed
and implemented an advanced prototype ontology repository for the agronomy
domain. We plan to turn that prototype into a real service to the community. The
AgroPortal project aims at reusing the scientific outcomes and experience of the
biomedical domain in the context of plant, agronomic, food, environment (perhaps
animal) sciences. We offer an ontology portal which features ontology hosting, search,
versioning, visualization, comment, recommendation, enables semantic annotation, as
well as storing and exploiting ontology alignments. All of these within a fully semantic
web compliant infrastructure. The AgroPortal specifically pays attention to respect the
requirements of the agronomic community in terms of ontology formats (e.g., SKOS,
trait dictionaries) or supported features. In this paper, we present our prototype as well
as preliminary outputs of four driving agronomic use cases. With the experience
acquired in the biomedical domain and building atop of an already existing technology,
we think that AgroPortal offers a robust and stable reference repository that will
become highly valuable for the agronomic domain.</p>
        <p>D203:
Humane OWL: RDF and OWL for Humans
James A. Overton
Abstract: Humane OWL (HOWL) is a syntax for RDF and OWL designed for manual
editing. By allowing human-readable labels to be used in place of IRIs, and providing
convenient syntax for OWL annotations and expressions, HOWL files can be used like
source code with tools such as GitHub, then translated into any other RDF or OWL
format for use with other tools.</p>
        <p>BIT101-D204:
Large-scale Semantic Indexing with Biomedical Ontologies
Chih-Hsuan Wei, Robert Leaman and Zhiyong Lu
Abstract: We introduce PubTator, a web-based application that enables large-scale
semantic indexing and automatic concept recognition in biomedical ontologies. Not only
was PubTator formally evaluated and top-rated in BioCreative, it also has been widely
adopted and used by the scientific community from around the world, supporting both
research projects and real-world applications in biocuration, crowdsourcing and
translational bioinformatics.</p>
        <p>D205:
Easy Extraction of Terms and Definitions with OWL2TL
John Judkins, Joseph Utecht and Mathias Brochhausen
Abstract: "Facilitating good communication between semantic web specialists and
domain experts is necessary to efficient ontology development. This development may
be hindered by the fact that domain experts tend to be unfamiliar with tools used to
create and edit OWL files. This is true in particular when changes to definitions need to
be reviewed as often as multiple times a day. We developed ""OWL to Term List""
(OWL2TL) with the goal of allowing domain experts to view the terms and definitions of
an OWL file organized in a list that is updated each time the OWL file is updated. The
tool is available online and currently generates a list of terms, along with additional
IP02:
annotation properties that are chosen by the user, in a format that allows easy copying
into a spreadsheet."
Adding evidence type representation to DIDEO
Mathias Brochhausen, Philip E. Empey, Jodi Schneider, William R. Hogan and Richard D.
Boyce
Abstract: In this poster we present novel development and extension of the Drug-drug
Interaction and Drug-drug Interaction Evidence Ontology (DIDEO). We demonstrate how
reasoning over this extension of DIDEO can a) automatically create a multi-level
hierarchy of evidence types from descriptions of the underlying scientific observations
and b) automatically subsume individual evidence items under the correct evidence
type. Thus DIDEO will enable evidence items added manually by curators to be
automatically categorized into a drug-drug interaction framework with precision and
minimal effort from curators. As with all previous DIDEO development this extension is
consistent with OBO Foundry principles.</p>
        <p>Multi-species Ontologies of the Craniofacial Musculoskeletal System
Jose Leonardo Mejino, James Brinkley, Timothy Cox and Landon Detwiler
Abstract: We created the Ontology of Craniofacial Development and Malformation
(OCDM) [1] to provide a unifying framework for organizing and integrating craniofacial
data ranging from genes to clinical phenotypes from multi-species. Within this
framework we focused on spatio-structural representation of anatomical entities
related to craniofacial development and malformation, such as craniosynostosis and
midface hypoplasia. Animal models are used to support human studies and so we built
multi-species ontologies that would allow for cross-species correlation of anatomical
information. For this purpose we first developed and enhanced the craniofacial
component of the human musculoskeletal system in the Foundational Model of
Anatomy Ontology (FMA)[2], and then imported this component, which we call the
Craniofacial Human Ontology (CHO), into the OCDM. The CHO was then used as a
template to create the anatomy for the mouse, the Craniofacial Mouse Ontology (CMO)
as well as for the zebrafish, the Craniofacial Zebrafish Ontology (CZO).</p>
        <p>EGO: a biomedical ontology for integrative epigenome representation and analysis
Yongqun He, Zhaohui Qin and Jie Zheng
Abstract: Epigenomics is crucial to understand biological mechanisms beyond genome
DNA. To better represent epigenomic knowledge and support data integration, we
developed a prototype Epigenome Ontology (EGO). EGO top level hierarchy and design
pattern are provided with a use case illustration. EGO is proposed to be used for
statistically analyzing enriched epigenomic features based on given sequence data input
using statistical methods.</p>
        <p>IP05:
An Ontological Representation for the Transtheoretical Theory
Hua Min, Robert H. Friedman and Julie Wright
Abstract: Ontologies are widely used in computer science and medicine. Ontologies may
be useful in health promotion and disease prevention for intervention development.
Interventionists usually use theory to guide intervention design and evaluation, but
there is no standard vocabulary for health behavior theory. A formal mechanism for
converting theory to a computer-based representation may provide a tool that can
assist in the development of computer-based interventions. This paper demonstrates
how ontology can be used to represent a health behavior theory using the
Transtheoretical Model (TTM) of behavior change as an example.</p>
        <p>IP06:
Building a molecular glyco-phenotype ontology to decipher undiagnosed diseases
Jean-Philippe Gourdine, Thomas Metz, David Koeller, Matthew Brush and Melissa
Haendel
Abstract: Hundreds of rare diseases are due to mutation on genes related to glycans
synthesis, degradation or recognition. These glycan-related defects are well described in
the literature but largely absent in ontologies and databases of chemical entities and
phenotypes, limiting the application of computational methods and ontology-driven
tools for characterization and discovery of glycan related diseases. We are curating
articles and textbooks in glycobiology related to genetic diseases to inform the content
and the structure of an ontology of Molecular Glyco-Phenotypes (MGPO). MGPO will be
applied toward use cases including disease diagnosis and disease gene candidate
prioritization, using semantic similarity and pattern matching at the glycan level with
glycomics data from patient of the Undiagnosed Diseases Network.
The Cell Line Ontology integration and analysis of the knowledge of LINCS cell lines
Edison Ong, Jiangan Xie, Zhaohui Ni, Qingping Liu, Yu Lin, Vasileios Stathias, Caty Chung,
Stephan Schurer and Yongqun He
Abstract: Cell lines are crucial to study molecular signatures and pathways, and are
widely used in the NIH Common Fund LINCS project. The Cell Line Ontology (CLO) is a
community-based ontology representing and classifying cell lines from different
resources. To better serve the LINCS research community, from the LINCS Data Portal
and ChEMBL, we identified 1,097 LINCS cell lines, among which 717 cell lines were
associated with 121 cancer types, and 352 cell line terms did not exist in CLO. To
harmonize LINCS cell line representation and CLO, CLO design patterns were slightly
updated to add new information of the LINCS cell lines including different database
cross-reference IDs. A new shortcut relation was generated to directly link a cell line to
the disease of the patient from whom the cell line was originated. After new LINCS cell
lines and related information were added to CLO, a CLO subset/view (LINCS-CLOview) of
LINCS cell lines was generated and analyzed to identify scientific insights into these
LINCS cell lines. This study provides a first time use case on how CLO can be updated
and applied to support cell line research from a specific research community or project
initiative.</p>
        <p>Gold-Standard Ontology-Based Annotation of Concepts in Biomedical Text in the CRAFT
Corpus: Updates and Extensions
Michael Bada, Nicole Vasilevsky, Melissa Haendel and Lawrence Hunter
Abstract: Ontologies are increasingly used for semantic integration across disparate
curated biomedical resources, while gold-standard annotated corpora are needed for
accurate training and evaluation of text-mining tools. Bringing together the respective
power of these, we created the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a
collection of full-length, open-access biomedical journal articles that have been
manually annotated both syntactically and semantically with select Open Biomedical
Ontologies (OBOs), the first release of which includes ~100,000 annotations of concepts
mentioned in the text of 67 articles and mapped to the classes of eight prominent OBOs.
Here we present our continuing work on the corpus, including updated versions of these
annotations with newer versions of the ontologies, new annotations made with two
additional OBOs, annotations made with newly created extension classes defined in
terms of existing classes of the ontologies, and new annotations of roots of prefixed and
suffixed words.
IT405:
Building Concordant Ontologies for Drug Discovery
Hande Küçük-Mcginty, Saurabh Metha, Yu Lin, Nooshin Nabizadeh, Vasileios Stathias,
Dusica Vidovic, Amar Koleti, Christopher Mader, Jianbin Duan, Ubbo Visser and Stephan
Schürer
Abstract: n this study we demonstrate how we interconnect three different ontologies,
the BioAssay Ontology (BAO), LINCS Information FramEwork ontology (LIFEo), and the
Drug Target Ontology (DTO). The three ontologies are built and maintained for three
different projects: BAO for the BioAssay Ontology Project, LIFEo for the Library of
Integrated Network-Based Cellular Signatures (LINCS) project, and DTO for the
Illuminating the Druggable Genome (IDG) project. DTO is a new ontology that aims to
formally describe drug target knowledge relevant to drug discovery. LIFEo is an
application ontology to describe information in the LIFE software system. BAO is a highly
accessed NCBO ontology; it has been extended formally to describe several LINCS
assays. The three ontologies use the same principle architecture that allows for re-use
and easy integration of ontology modules and instance data. Using the formal
definitions in DTO, LIFEo, and BAO and data from various resources one can quickly
identify disease-relevant and tissue- specific genes, proteins, and prospective small
molecules. We show a simple use case example demonstrating knowledge-based linking
of life science data with the potential to empower drug discovery.</p>
        <p>IT406-IP35:
The Planteome Project
Laurel Cooper, Austin Meier, Justin Elser, Justin Preece, Xu Xu, Ryan Kitchen, Botong Qu,
Eugene Zhang, Sinisa Todorovic, Pankaj Jaiswal, Marie-Angélique Laporte, Elizabeth
Arnaud, Seth Carbon, Chris Mungall, Barry Smith, Georgios Gkoutos and John Doonan
Abstract: The Planteome project is a centralized online plant informatics portal which
provides semantic integration of widely diverse datasets with the goal of plant
improvement. Traditional plant breeding methods for crop improvement may be
combined with next-generation analysis methods and automated scoring of traits and
phenotypes to develop improved varieties. The Planteome project
(www.planteome.org) develops and hosts a suite of reference ontologies for plants
associated with a growing corpus of genomics data. Data annotations linking
phenotypes and germplasm to genomics resources are achieved by data transformation
and mapping species-specific controlled vocabularies to the reference ontologies.
Analysis and annotation tools are being developed to facilitate studies of plant traits,
phenotypes, diseases, gene function and expression and genetic diversity data across a
wide range of plant species. The project database and the online resources provide
researchers tools to search and browse and access remotely via APIs for semantic
integration in annotation tools and data repositories providing resources for plant
biology, breeding, genomics and genetics.</p>
        <p>IT407:
Annotating germplasm to Planteome reference ontologies
Austin Meier, Laurel Cooper, Justin Elser, Pankaj Jaiswal and Marie-Angélique Laporte
Abstract: An expected use case of plant phenotype ontologies will be the identification
of germplasm containing particular traits of interest. If phenotype data from
experiments is annotated using ontologies, it makes sense to include annotations to
that germplasm source. A lack of standardized data formatting reduces the utility of
these data. Standardizing germplasm data, including links to germplasm databases, or
distribution locations improves collaboration, and benefits both researchers and the
scientific community as a whole. All plant traits contained in the Planteome reference
ontologies are searchable, and interconnected through relationships in the ontology. All
data annotated to these reference ontologies will be displayed, shareable, and
computable through the Planteome website (www.planteome.org) and APIs. This
manuscript will discuss the advantages of standardizing germplasm trait annotation, and
the semi-automated process developed to achieve such standardization.
IT501:
A Descriptive Delta for Identifying Changes in SNOMED CT
Christopher Ochs, Yehoshua Perl, Gai Elhanan and James Case
Abstract: SNOMED CT is a large and complex medical terminology. Thousands of editing
operations are applied to its content for each new release. Understanding what changed
in a release is important for the end user and SNOMED CT editors. Each SNOMED CT
release comes with release notes that provide a brief description of the changes that
occurred and a set of delta files that identify individual changes in the content. The
release notes are brief and changes to thousands of concepts may be described in a few
sentences, whereas the delta files contain tens of thousands of individual changes. To
better identify how SNOMED CT content changes between releases we introduce a
methodology of creating a descriptive delta that captures the editing operations that
were applied to SNOMED CT content in a given release in a more comprehensible form.
We use this methodology to analyze editing operations that were part of a recent
remodeling effort of the Congenital disease and Infectious disease subhierarchies in the
large Clinical finding hierarchy.
IT502:
Visualizing the “Big Picture” of Change in NCIt’s Biological Processes
Yehoshua Perl, Christopher Ochs, Sherri de Coronado and Nicole Thomas
Abstract: The National Cancer Institute thesaurus (NCIt) is a large and complex
ontology. NCIt is frequently updated; a new release is made available approximately
every month. Tracking structural changes in NCIt is important for the editors of its
content. In this paper we describe a methodology and tool using diff partial-area
taxonomies to visually summarize structural changes between two NCIt releases. Diff
partial-area taxonomies provide a comprehensible view of the overall impact of the
changes. This methodology is illustrated using the Biological Process hierarchy.
Specifically, we illustrate how diff partial-area taxonomies reflect change that occurred
due to major restructuring of this hierarchy between September 2004 and December
2004. During this time the hierarchy nearly doubled in size and a large portion of the
classes were extensively modified. Several kinds of change patterns are identified and
discussed.</p>
        <p>IT503:
Malaria study data integration and information retrieval based on OBO Foundry ontologies
Jie Zheng, Jashon Cade, Brian Brunk, David Roos, Chris Stoeckert, San James, Emmanuel
Arinaitwe, Bryan Greenhouse, Grant Dorsey, Steven Sullivan, Jane Carlton, Gabriel
Carrasco-Escobar, Dionicia Gamboa, Paula Maguina-Mercedes and Joseph Vinetz
Abstract: The International Centers of Excellence in Malaria Research (ICEMR) projects
involve studies to understand the epidemiology and transmission patterns of malaria in
different geographic regions. Two major challenges of integrating data across these
projects are: (1) standardization of highly heterogeneous epidemiologic data collected
by various ICEMR projects; (2) provision of user-friendly search strategies to identify and
retrieve information of interest from the very complex ICEMR data. We pursued an
ontology-based strategy to address these challenges. We utilized and contributed to the
Open Biological and Biomedical Ontologies to generate a consistent semantic
representation of three different ICEMR data dictionaries that included ontology term
mappings to data fields and allowed values. This semantic representation of ICEMR data
served to guide data loading into a relational database and presentation of the data on
web pages in the form of search filters that reveal relationships specified in the ontology
and the structure of the underlying data. This effort resulted in the ability to use a
common logic for storing and display of data on study participants, their clinical visits,
and epidemiological information on their living conditions (dwelling) and geographic
location. Users of the Plasmodium Genomics Resource, PlasmoDB, accessing the ICEMR
data will be able to search for participants based on environmental factors such as type
of dwelling, location or mosquito biting rate, characteristics such as age at enrollment,
relevant genotypes or gender and visit data such as laboratory findings, diagnoses,
malaria medications, symptoms, and other factors.</p>
        <p>IT504:
OOSTT: a Resource for Analyzing the Organizational Structures of Trauma Centers and
Trauma Systems
Joseph Utecht, John Judkins, Mathias Brochhausen, Terra Colvin Jr., J. Neil Otte,
Nicholas Rogers, Robert Rose, Maria Alvi, Amanda Hicks, Jane Ball, Stephen M. Bowman,
Robert T. Maxson, Rosemary Nabaweesi, Rohit Pradhan, Nels D. Sanddal, M. Eduard
Tudoreanu and Robert Winchell
Abstract: Organizational structures of healthcare organizations has increasingly become
a focus of medical research. In the CAFÉ project we aim to provide a web-service
enabling ontology-driven comparison of the organizational characteristics of trauma
centers and trauma systems. Trauma remains one of the biggest challenges to
healthcare systems worldwide. Research has demonstrated that coordinated efforts like
trauma systems and trauma centers are key components of addressing this challenge.
Evaluation and comparison of these organizations is essential. However, this research
challenge is frequently com-pounded by the lack of a shared terminology and the lack of
effective information technology solutions for assessing and com-paring these
organizations. In this paper we present the Ontology of Organizational Structures of
Trauma systems and Trauma centers (OOSTT) that provides the ontological foundation
to CAFÉ's web-based questionnaire infrastructure. We present the usage of the
ontology in relation to the questionnaire and provide the methods that were used to
create the ontology.</p>
        <p>IT505:
Towards a Standard Ontology Metadata Model
Hua Min, Stuart Turner, Sherri de Coronado, Brian Davis, Trish Whetzel, Robert R.
Freimuth, Harold R. Solbrig, Richard Kiefer, Michael Riben, Grace A. Stafford, Lawrence
Wright and Riki Ohira
Abstract: Bio-ontologies are becoming increasingly important in semantic alignment for
data integration, information exchange, and semantic interoperability. Due to the large
number of emerging bio-ontologies, it is challenging for ontology for their applications.
Therefore, it is important to have a consistent terminology metadata model and a
resource for discovering appropriate ontologies or other resource for use in annotating
data. This paper aims to seek a common, shareable, and comprehensive method to
create, disseminate, and consume metadata about terminology resources.
An Ontological Framework for Representing Topological Information in Human Anatomy
Takeshi Imai, Emiko Shinohara, Masayuki Kajino, Ryota Sakurai, Kazuhiko Ohe, Kouji
Kozaki and Riichiro Mizoguchi
Abstract: Medical ontologies have been a focus of constant attention in recent years as
one of the fundamental techniques and knowledge bases for clinical decision support
applications. In this paper, we discuss the description framework of our anatomy
ontology with a focus on representing topological information, which is required for
anatomical reasoning in clinical decision support applications. Our framework has major
advantages over preceding studies with respect to: (1) representations of branching
sequence; (2) combined representation of relevant knowledge with the use of “general
structural component”; and (3) cooperation with the disease and abnormality
ontologies.</p>
        <p>Natural Language Definitions for the Leukemia Knowledge Domain
Amanda Damasceno De Souza and Maurício Barcellos Almeida
Abstract: The creation of natural definitions is a phase of any methodology to build
formal ontologies. In order to reach formal definitions, one should first create natural
language definitions according to sound principles. We gather a set of principles
available in literature and organize them in a list of stages that one can use to create
good definitions in natural language. In order to test the set of principles, we conducted
a case study in which we create definitions in the domain of cancer, more specifically,
definitions for acute myeloid leukemia. After creating and validating the definition of
this specific kind of leukemia, we offer remarks about the experiment.
Identifying Missing Hierarchical Relations in SNOMED CT from Logical Definitions Based on
the Lexical Features of Concept Names
Olivier Bodenreider
Abstract: Objectives. To identify missing hierarchical relations in SNOMED CT from
logical definitions based on the lexical features of concept names. Methods. We first
create logical definitions from the lexical features of concept names, which we
represent in OWL EL. We infer hierarchical (subClassOf) relations among these concepts
using the ELK reasoner. Finally, we compare the hierarchy obtained from lexical features
to the original SNOMED CT hierarchy. We review the differences manually for
evaluation purposes. Results. Applied to 15,833 disorder and procedure concepts, our
approach identified 559 potentially missing hierarchical relations, of which 78% were
deemed valid. Conclusions. This lexical approach to quality assurance is easy to
implement, efficient and scalable.</p>
        <p>IT602:
A Semantic Web Representation of Entire Populations
Daniel Welch, Amanda Hicks, Josh Hanna and William Hogan
Abstract: Accurately representing demographic realities is a critical component in
creating useful, agent-based epidemiological models of infectious disease. Synthetic
ecosystems are generated from Census data microsamples in a statistically-sound
manner to maintain population-level demographic characteristics. These highly detailed
representations of populations are the basis of many advanced simulations of infectious
disease epidemics. Creating a standard, machine-readable representation of synthetic
ecosystem data would enable easier use and integration with epidemic simulator
software. Here we describe an ontology-based representation in Resource Description
Framework (RDF) and Web Ontology Language (OWL) of version 1.0 of the 2010 U.S.
Synthetic Population database by RTI International. Our representation draws upon
applicable classes from several reference ontologies, including the Ontology of
Medically Related Social Entities (OMRSE). After failing to find suitable ontological
representations of several key data elements in the Synthetic Population dataset, we
created new classes in OMRSE for representing employment status, employee roles,
workplaces, residences, households, and age measurements. We loaded a test RDF
dataset (structured according to ontologies in OWL) of synthetic individuals into a
commercial triple store (Stardog) and validated the representation with SPARQL queries.
IT603:
Improving the Semantics of Drug Prescriptions with a Realist Ontology
Jean-Francois Ethier, Ryeyan Taseen, Luc Lavoie and Adrien Barton
Abstract: Electronic prescriptions are supported as a means to reduce adverse drug
events, but the ambiguities and overspecificities of prescription semantics along with
their lack of standardization reduce adoption, limit interoperability and are potential
sources of error. Ontologies in the OBO Foundry, founded on realist methodology, have
been successful in fostering the logical, scientifically accurate data standards that the
domain of drug prescriptions is currently in need of. This paper illustrates some
problems regarding the structuration of current electronic prescriptions, and
demonstrates how the Prescription of Drugs Ontology (PDRO) addresses these issues
with improved semantics founded on OBO and realist principles. PDRO reuses classes
and object properties from IAO, OBI, OGMS, OMRSE and DRON, introducing new entities
within its scope and proposing entities within those of its imported domains that may
be useful to other health care and information artifact-related ontologies in the OBO
Foundry. PDRO aims at improving the semantics of drug prescriptions and prospectively
enabling the interoperability of prescription data.</p>
        <p>IT604:
Qualitative causal analyses of biosimulation models
Maxwell Neal, John Gennari and Daniel Cook
Abstract: We describe an approach for performing qualitative, systems-level causal
analyses on biosimulation models that leverages semantics-based modeling formats,
formal ontology, and automated inference. The approach allows users to quickly
investigate how a qualitative perturbation to an element within a model’s network (an
increment or decrement) propagates throughout the modeled system. To support such
analyses, we must interpret and annotate the semantics of the models, including both
the physical properties modeled and the dependencies that relate them. We build from
prior work understanding the semantics of biological properties, but here, we focus on
the semantics for dependencies, which provide the critical knowledge necessary for
causal analysis of biosimulation models. We de-scribe augmentations to the Ontology of
Physics for Biology, via OWL axioms and SWRL rules, and demonstrate that a reasoner
can then infer how an annotated model’s physical properties influence each other in a
qualitative sense. Our goal is to provide researchers with a tool that helps bring the
systems-level network dynamics of biosimulation models into perspective, thus
facilitating model development, testing, and application.</p>
        <p>IT605:
SEPIO: A Semantic Model for the Integration and Analysis of Scientific Evidence
Matthew Brush, Kent Shefchek and Melissa Haendel
Abstract: The Scientific Evidence and Provenance Information Ontology (SEPIO) was
developed to support the description of evidence and provenance information for
scientific claims. The core model represents the relationships between claims, their lines
of evidence, and the data items that comprise this evidence, as well as the methods,
tools, and agents involved in the creation of these artifacts. SEPIO was initially
developed to support the data integration and analysis efforts of the Monarch Initiative,
where it provides a unified and computable representation of evidence and provenance
metadata for genotype-phenotype associations aggregated across diverse model
organism and clinical genetics databases. However, additional requirements were
collected from diverse community partners in an effort to provide a shared community
standard, with a core model that is domain independent and extensible to represent
any type of claim and its associated evidence. In this report we describe the structure
and principles behind the SEPIO model, and review its applications in support of data
integration, curation, knowledge discovery, and manual and computational evaluation
of scientific claims. The SEPIO ontology can be found at
http://github.com/monarchinitiative/SEPIO-ontology/blob/master/src/ontology/sepio.owl.</p>
        <p>IT606:
Measuring the importance of annotation granularity to the detection of semantic similarity
between phenotype profiles
Prashanti Manda, James P. Balhoff and Todd J. Vision
Abstract: In phenotype annotations curated from the biological and medical literature,
considerable human effort must be invested to select ontological classes that capture
the expressivity of the original natural language descriptions, and finer annotation
granularity can also entail higher computational costs for particular reasoning tasks. Do
coarse annotations suffice for certain applications? Here, we measure how annotation
granularity affects the statistical behavior of semantic similarity metrics. We use a
randomized dataset of phenotype profiles drawn from 57,051 taxon-phenotype
annotations in the Phenoscape Knowledgebase. We compared query profiles having
variable proportions of matching phenotypes to subject database profiles using both
pairwise and groupwise Jaccard (edge-based) and Resnik (node-based) semantic
similarity metrics, and compared statistical performance for three different levels of
annotation granularity: entities alone, entities plus attributes, and entities plus qualities
(with implicit attributes). All four metrics examined showed more extreme values than
expected by chance when approximately half the annotations matched between the
query and subject profiles, with a more sudden decline for pairwise statistics and a more
gradual one for the groupwise statistics. Annotation granularity had a negligible effect
on the position of the threshold at which matches could be discriminated from noise.
These results suggest that coarse annotations of phenotypes, at the level of entities with
or without attributes, may be sufficient to identify phenotype profiles with statistically
significant semantic similarity.</p>
        <p>IT701:
A Quality-Assurance Study of ChEBI
Hasan Yumak, Ling Chen, Michael Halper, Ling Zheng, Yehoshua Perl and Gai Elhanan
Abstract: Ontologies are important components of many health-information systems.
The Chemical Entities of Biological Interest (ChEBI) ontology has become a standard
reference for chemicals appearing in biological contexts. As such, assuring the quality of
its content is imperative. In fact, ChEBI has a dedicated Web page at which errors and
inconsistencies in its concepts can be reported. A study of the correctness of a random
sample of ChEBI concepts is carried out. The results show that quite a large number of
ChEBI concepts suffer from some kind of problematic modeling. For example, we found
that 15.5% of the sample concepts exhibited severe errors of commission, including
incorrect hierarchical (is a) and lateral relationships. Errors of omission were also
prevalent. The overall results of our quality-assurance (QA) study are presented.
Suggestions for enhancing the QA processes in place for ChEBI are discussed.
IT702:
To MIREOT or not to MIREOT? A case study of the impact of using MIREOT in the
Experimental Factor Ontology (EFO)
Luke Slater, Georgios Gkoutos, Paul Schofield and Robert Hoehndorf
Abstract: MIREOT is a mechanism for the selective re-use of individual ontology classes
in other ontologies. Designed to minimise effort and to support orthogonality, it is now
in widespread use. The consequences for ontology integrity and automated reasoning of
using the MIREOT mechanism have so far not been fully assessed. In this paper, we
perform an analysis of the Experimental Factor Ontology (EFO), an ontology which uses
the MIREOT process to gather classes from a large range of other ontologies. Our study
examines the effect of combining EFO with the ontologies it references by actually
importing them into the EFO. We then evaluate the consistency and status of the
combined ontologies. Through our investigation, we reveal that EFO in combination
with all its referenced ontologies is logically inconsistent. Furthermore, when EFO is
individually combined with many of the ontologies it references, we find a large number
of unsatisfiable classes. These results demonstrate a potential problem within a major
ontological ecosystem, and reveals possible disadvantages to the use of the MIREOT
system for developing ontologies.</p>
        <p>IT703:
Semantic Digitization of Experimental Data in Biological Sciences
Saurabh Raghuvanshi
Abstract: A major bulk of published experimental data, referred to as “Gold Standard”
data, is available in a format that cannot be easily accessed by computers unless
effectively curated. Most curation techniques bank on mining the text for information.
Here we propose and demonstrate the efficacy of curating the experimental data itself.
The data models facilitate digitization of the every aspect of the information associated
with the experimental data. The models utilize several universally accepted ontologies
as well as in-house developed alphanumeric notations for digitizing different aspect of
the data. The data models have sufficient flexibility to address the extensive variability
in experimental data. They have a very generic nature and can be used to curate and
digitize experimental data from any organism. The digitized data is easily stored in a
relational database management system and can thus be rapidly searched and
integrated. These models have been successfully used to digitize data from over 20,000
experiments spanning over 500 research articles on rice biology. The entire dataset is
available as a database entitled Manually Curated Database of Rice Proteins at
www.genomeindia.org/biocuration.</p>
        <p>IT704:
Representation of parts within the Foundational Model of Anatomy ontology
Melissa Clarkson
Abstract: As biomedical ontologies grow in size and complexity it is crucial to develop
methods for detecting inconsistencies within ontologies. The Foundational Model of
Anatomy (FMA) ontology represents knowledge of human anatomy, with structural
organization provided by class and part relationships. Using a manual audit, I identify
types of inconsistencies arising from class and regional part relationships for regions of
the body and the parts of organs. Inconsistencies arise from both explicitly declared
relationships and relationships that are implied by the lexical constructs of class names.
The purpose of this work is to propose methods of structural organization and lexical
consistency that will make the FMA more compatible with computational auditing and
increase its usability.</p>
        <p>IT705:
A Realist Representation of Social Identity Data
Amanda Hicks
Abstract: Social identities merit special treatment in realist ontologies. Their ontological
status is unsettled, so we should model them in a manner that is agnostic with respect
to their ontological status. Nevertheless, there is a clear criterion for determining
whether a specific person has a particular identity, namely, whether that person asserts
that they do. This social act forms the basis for a realist representation, not of social
identities themselves, but of data about social identities. We report the representation
of social identities in the Ontology of Medically Related Social Entities and show that it
supports data integration and retrieval.</p>
        <p>W14-03:
The Plant Phenology Ontology for Phenological Data Integration
Brian J. Stucky, John Deck, Ellen Denny, Robert P. Guralnick, Ramona L. Walls and
Jennifer Yost
Abstract: Plant phenology the timing of life-cycle events, such as flowering or
leafingout has cascading effects on multiple levels of biological organization, from individuals
to ecosystems. Despite the importance of understanding phenology for managing
biodiversity and ecosystem services, we are not currently able to address
continentscale phenological responses to anticipated climatic changes. This is not because we lack
relevant data. Rather, the problem is that the disparate organizations producing
largescale phenology data are using non-standardized terminologies and metrics during data
collection and data processing. Here, we preview the Plant Phenology Ontology, which
will provide the standardized vocabulary necessary for annotation of phenological data.
We are aggregating, annotating, and analyzing the most significant phenological data
sets in the USA and Europe for broad temporal, geographic, and taxonomic analyses of
how phenology is changing in relation to climate change.</p>
        <p>W14-02:
The Biological Collections Ontology for linking traditional and contemporary
biodiversity data
Ramona Walls and Rob Guralnick
Abstract: Biodiversity data comes from many sources, ranging from museum specimens
to field surveys to genomic sequences. Domain specific standards provide vocabularies
for many types of these data, but they do not fully support integrating data across
methods, scales, and domains. The Biological Collections Ontology (BCO) was designed
to bridge the terminology gap between traditional museum-based specimen collections
and more contemporary environmental sampling methods, such as metagenomic
sequencing, by providing a logically defined set of terms for biodiversity that map to
standards such as the Darwin Core and Minimum Information for any Sequence. The
BCO is expanding to encompass observational biodiversity data such as field surveys and
taxonomic inventories. A key design principle of the BCO is to clearly distinguish the
different types of processes involved in biodiversity data collection along with the inputs
and outputs of those processes. The BCO has applications to plant biodiversity studies
for linking herbarium specimens to sequence data, connecting trait data to specimens,
and describing survey data.</p>
        <p>BT101:
Cycles of Scientific Investigation in Discourse - Machine Reading Methods for the Primary
Research Contributions of a Paper
Gully A. Burns; Anita de Waard; Pradeep Dasigi; Eduard H. Hovy
Abstract: We describe a novel approach to machine reading of the primary scientific
literature. We treat a description of an experiment as a discourse, viewing a scientific
corpus not merely into a collection of documents, but also an extended conversation
formed by the collective set of experiments, their introductions and interpretations. This
paper introduces this approach as a methodology called ‘Cycles of Scientific
Investigation in Discourse’ (CoSID). In CoSID, we capture the central conceptual
structure of a paper as a series of nested reasoning loops, composed of passages in
results sections, which describe individual research findings. We ground our work with a
number of worked examples based on data from the MINTACT and Pathway Logic
databases, and illustrate the idea in the context of machine-enable biocuration.
BT102:
Collaborative Workspaces for Pathway Curation
Funda Durupinar-Babur; Metin Can Siper; Ugur Dogrusoz; Istemi Bahceci; Ozgun Babur;
Emek Demir
Abstract: We present a web based visual biocuration workspace, focusing on curating
detailed mechanistic pathways. It was designed as a flexible platform where multiple
humans, NLP and AI agents can collaborate in real-time on a common model using an
event driven API. We will use this platform for exploring disruptive technologies that can
scale up biocuration such as NLP, human-computer collaboration, crowd-sourcing,
alternative publishing and gamification. As a first step, we are designing a pilot to
include an author-curation step into the scientific publishing, where the authors of an
article create formal pathway fragments representing their discovery- heavily assisted
by computer agents. We envision that this “micro-curation” use-case will create an
excellent opportunity to integrate multiple NLP approaches and semi-automated
curation.</p>
        <p>BT103:
Crowdsourcing Protein Family Database Curation
Matt Jeffryes; Maria Liakata; Alex Bateman
Abstract: We propose a novel method for crowdsourcing a protein family database. We
discuss how we intend to identify novel groupings of proteins from user sequence
similarity search, and how text mining will be applied to assist in annotation of these
novel groupings, and more broadly as an enrichment of protein sequence similarity
search results.</p>
        <p>BT104:
Opportunities and challenges presented by Wikidata in the context of biocuration
Benjamin Good; Sebastian Burgstaller-Muehlbacher; Elvira Mitraka; Timothy Putman;
Andrew Su ; Andra Waagmeester
Abstract: Wikidata is a world readable and writable knowledge base maintained by the
Wikimedia Foundation. It offers the opportunity to collaboratively construct a fully open
access knowledge graph spanning biology, medicine, and all other domains of
knowledge. To meet this potential, social and technical challenges must be overcome
most of which are familiar to the biocuration community. These include community
ontology building, high precision information extraction, provenance, and license
management. By working together with Wikidata now, we can help shape it into a
trustworthy, unencumbered central node in the Semantic Web of biomedical data.
BT201:
Text mining to enable routine personalized cancer therapy
Hua Xu
Abstract: Genomic profiling information is frequently available to oncologists, enabling
targeted cancer therapy. Because clinically relevant genomic information is rapidly
emerging in narrative data sources such as biomedical literature and clinical trials
documents, there is a need for text mining technologies to support targeted therapies.
In this talk, we will present two projects about developing text-mining tools to enable
personalized cancer therapy, including 1) to identify molecular effects of drugs in
biomedical literature, and 2) to create a knowledge base of cancer treatment trials with
annotations about genetic alterations. We believe such tools would be valuable for
physicians and patients who are seeking information about personalized cancer therapy,
thus facilitating their decision making.</p>
        <p>BT202:
Social Media Mining for Pharmacovigilance
Graciela Gonzalez
Abstract: N/A
BT203:
MutD – A PubMed Scale Resource for Protein Mutation-Disease Relations through
BioMedical Literature Mining
Ravikumar Komandur Elayavilli; Majid Rastegar-Mojarad; Hongfang Liu
Abstract: A large amount of information about the role of gene variants and mutations
in diseases is available in curated databases such as OMIM, ClinVar, and UniprotKB.
However, much of this information remains ‘locked’ in the unstructured form in the
scientific publications. Since manual curation involves significant human effort and time
there is always a lag in the information between the curated databases and the
literature. The recent findings published in the literature takes significant time to find its
way into the curated knowledgebase. Text mining approaches can accelerate the
process of assembling this knowledge from the published literature. However,
developing a text-mining system with semantic understanding capability in the
biomedical domain is very challenging. In an earlier work, we described MutD, a
literature mining system that extracts relationship between protein point mutation and
diseases from bio-medical abstracts. In this abstract, we present access to a PubMed
scale resource through a web interface that allows users to retrieve protein point
mutation-disease relations extracted through biomedical literature mining.
CancerMine: Knowledge base construction for personalised cancer treatment
Jake Lever; Martin Jones; Steven Jm Jones
Abstract: Knowledge of the relevant genomic aberrations that drive a particular cancer
type is necessary to accelerate efficient interpretation of genomic data and enable
large-scale endeavors in precision medicine. Currently, this field is limited by the lack of
focused and scalable literature curation tools that can reliably capture the required
information. Here we present a knowledge-base of genes that have been described in
the literature as drivers, oncogenes or tumour suppressors with respect to a specific
type of cancer. We have annotated a large body of literature which reports oncogenic
aberrations using a custom designed annotation tool. We then applied VERSE, an
inhouse relation extraction tool, to catalogue driver mutations and illustrate the ability to
build a useful resource for clinical interpretation of genomic data for personalized
treatment approaches.</p>
        <p>Text Mining for Drug Development: Gathering Insights to Support Decision Making
Sherri Matis-Mitchell
Abstract: Drug discovery in Pharma R&amp;D is an information driven process requiring
many disparate bits of data from many different sources, both structured and
unstructured. Text mining is the key methodology used to extract entities and
relationships from unstructured text in the quest for the knowledge needed to bring a
safe and effective drug to market and beyond. Much of the insight needed in early drug
research to identify drug target to disease relationships and progress a potential drug
target, comes from published literature and internal reports. Later stage drug
development requires many additional sources of information including case reports,
clinical trials, competitive intelligence and other diverse sources. In this publication, I
will present 4 different use cases on how text mining is used to drive decision making in
drug discovery and development and also how it can be used to identify patient insights
from sources such as social media.
BT301:
NLP for the Institute: Developing and Deploying an NLP Capability to Accelerate Cancer
Research
Aaron Cohen
Abstract: "It has been well documented that a great deal of data useful for medical
research is present in clinical narrative text. There is perhaps less discussion about how
often what was structured data at its origin has become inaccessible except in free text
form. This problem is further compounded in tertiary care institutions, like the OHSU
Knight Cancer Institute, where the entire history of a referred patient's condition may
only be present in the electronic health record (EHR) as free text. At the same time,
future medical advances, such as in cancer research, will require much more complete
patient data than has been previously available. Such advances include the discovery of
new cures, expanding early detection, and realizing the promise of precision medicine.
Phenotype description and outcome characterization are two areas in particular where
text sources could greatly supplement our current data. The OHSU Knight Cancer
Institute has begun a program to create a natural language processing (NLP) capability
to extract, store, and link data from free text sources at the patient level, and make this
data available to researchers in a continuous, reusable, efficient and timely manner
through services delivery from the Translational Research Hub (TRH). This talk will
present the challenges, progress, and future goals of our program to build NLP
capabilities that can help us use free text from the EHR to first support the
transformation of cancer research with the hopes of positively impacting clinical care in
the future."
BT302:
Annotations for biomedical research and healthcare -- Bridging the gap
Olivier Bodenreider
Abstract: "Characterizing protein products from various model organisms with Gene
Ontology terms, indexing the biomedical literature with MeSH descriptors, and coding
clinical data with ICD10-CM all constitute examples of annotation tasks, i.e., the
extraction and summarization of knowledge related to a biological entity, article or
patient, in reference to some controlled vocabulary or ontology. However, the
annotations made in biomedical research and healthcare environments tend to rely on
different terminologies and ontologies, making it difficult to reconcile these annotations
for translational research purposes. We will discuss how terminology integration
systems, such as the Unified Medical Language System (UMLS) and BioPortal, can help
bridge the gap between annotations made by biomedical researchers and physicians,
and argue that more efforts are needed to foster interoperability between the resources
developed by these two communities."
BT303:
PubAnnotation: a public shared platform for scientific literature annotation.
Jin-Dong Kim
Abstract: "In the last decade, the technology for biomedical literature annotation made
a significant progress in terms of accuracy and speed. Now, some annotation systems
claim that they have reached a production level. However, there still remain critical
issues which we believe hinder further progress of the community. Among them, a
relatively well known issue is "interoperability" of annotation resources. We also
recognize that the community is missing a general solution for "storage infrastructure".
The talk will present the PubAnnotation project which aims at addressing these two
issues. In the end, a new model for "sustainable shared tasks", which is implemented on
PubAnnotation, will be introduced as well."
BioCconvert: A Conversion Tool between BioC and PubAnnotation
Donald C. Comeau; Rezarta Islamaj Doğan; Sun Kim; Chih-Hsuan Wei; W. John Wilbur;
Zhiyong Lu
Abstract: BioC is a simple XML data format for text, annotations, and relations.
PubAnnotation is a repository of text annotations focused on the life science literature.
A conversion tool between BioC XML and the JSON import / export format of
PubAnnotation has been developed, BioCconvert. As a demonstration, the Ab3P gold
standard abbreviation annotations are being made available through PubAnnotation.
Ontology of the Organigram
B. Smith
Abstract: Basic Formal Ontology (BFO) is a domain-neutral top-level ontology designed
to serve as the starting point for development of domain ontologies designed to
support the consistent annotation not only of scientific research data but also of data
arising through clinical practice, hospital administration, and regulatory oversight. In
each of these areas data are generated relating to what are called deontic entities
-obligations, duties, contracts, permissions, consents, licenses, and so forth. I will sketch
how we can understand entities of these sorts within the BFO framework.
W04-02
Relations between Institutional Roles and Deontic Roles in Biomedical Organizations
Otte J.N., Brochhausen M.</p>
        <p>Abstract: Doctors, nurses, surgeons, and other healthcare professionals are bearers of
institutional roles (e.g. physician role), and it is common to think of these roles as having
deontic powers as parts. This is reflected in sentences like: "Part of my job involves an
obligation to oversee patient care" and "Doctors at this hospital have the following
privileges". The Document Act Ontology (d-acts;
http://purl.obolibrary.rog/obo/iao/dacts.owl) enables us to represent how deontic powers are created by document acts.
Deontic powers are realizable entities that we treat as deontic roles (e.g. obligor role).
This means deontic powers are roles themselves. In this talk, we turn to the question of
the relationships between institutional roles and deontic roles. We propose that
institutional roles can have deontic roles as parts. We illustrate how different kinds of
deontic powers give rise to different parthood relations, and we conclude by arguing
that such relations can better capture the nature of organizational structure than the
traditional reliance on institutional roles alone.</p>
        <p>W04-03
An ontological study of healthcare corporations and their social entities
M.B. Almeida
Abstract: Healthcare corporations have a primary goal to provide high-quality health
care to those who seek its assistance. The quality and safety of care provided by a
healthcare corporation depends on many factors involving both medical and business
decisions. One crucial factor in healthcare corporations function is the information
management mainly performed through information systems, which process
information for both clinical decision making and for fulfilling internal and external legal
obligations. In this lecture, we propose a sketch of an ontology-based model for
healthcare corporations with the aim of facilitating the coordination of the information
systems involved in either medical or management activities. In order to accomplish
this, we focus on three efforts: i) to shed some light on the ontological status of
corporations; ii) to clarify the relations that exist between the corporation as whole and
both its members and units; iii) to explain how an corporation administers its duties and
responsibilities on behalf of people who compose it.
W04-04
Who's paying and who's graying? The organizations and roles associated
with insurance policies, funding agencies, and national census data
A. Hicks &amp; W.R. Hogan
Abstract: Many social roles are necessarily related to organizations. Employee roles and
primary insured roles are just a few examples. However, the relations between such
roles and organizations have yet to be systematically worked out in the context of
BFObased ontologies. As an initial step toward developing such a systematic representation,
we present use-case driven representations connecting roles to organizations in the
context of insurance policies, scientific grants, and U.S. National Census data in
OWL/RDF.
Dealing with social and legal entities in the obstetric and neonatal domain
F. Farinelli &amp; M.B. Almeida
Abstract: OntONeo is an ontology for the obstetric and neonatal domain, which has
been created to provide a consensus representation of salient electronic health record
(EHR) data and in order to serve interoperability of the associated data and information
systems. Regardless of medical specialty, in general, every EHR deal with data about the
entities involved in a health care appointment. We observed the existence at least three
actors: health care facility, the physician, and the patient. Besides dealing with the social
entities that participating in a health care appointment, we must also deal with the
characteristics that define them and the events that are involved. Here, we demonstrate
the utility of ontologies of social entities in the obstetric and neonatal domain. We
present how OntONeo is dealing with the material entities who perform or participates
in a health care encounter. The definition of these actors plus their related
characteristics and events will also contribute to turning on the interoperability of
information among EHR from different specialties. OntONeo is being developed with an
approach based on ontological realism and the principles of OBO Foundry, including
reuse of reference ontologies. Among our reusable ontologies, we can mention for
instance the OMRSE, d-acts, PATO and OBI.</p>
        <p>W01-01
Big Data Visual Analysis
Johnson C.</p>
        <p>Abstract: We live in an era in which the creation of new data is growing exponentially
such that every two days we create as much new data as we did from the beginning
of mankind until the year 2003. One of the greatest scientific challenges of the
21st century is to effectively understand and make use of the vast amount of
information being produced. Visual data analysis will be among our most important
tools to understand such large and often complex data. In this talk, I will present
stateof-the-art visualization techniques, applied to important Big Data problems in science,
engineering, and medicine.
Topology-Driven Data Visualization of Large-Scale Tensor Fields
Zhang Y.</p>
        <p>Abstract: Spatial-temporally varying simulation data sets are growing at a scale that
traditional visualization techniques are inadequate to handle. In this talk, we review
topology-driven techniques which focus on extracting the key (topological) features in
3D symmetric tensor fields, which can provide a more compact visualization of the data.
Computer Vision for Next Generation Phenomics and Tree of Life
Todorovic S.</p>
        <p>Abstract: To build the Tree of Life, scientists collect data on all heritable features – both
genotypes (e.g., DNA sequences) and phenotypes (e.g., anatomy, behavior, physiology)
for all living and extinct species. The collection of phenomic data for tree-building has
lagged far behind the collection of genomic data. Advances in computer vision have the
potential to change this situation. In this talk, I will present a computer vision system
developed in our lab for extending phenomic matrices, and in this way building the Tree
of Life. Rows of a phenomic matrix represent images of specimens belonging to various
species of interest, and columns represent scores of their phenomic characters. Given
a phenomic matrix where only few rows are manually annotated with character scores,
our vision system extends the matrix row-wise by populating missing character scores of
the remaining species in the matrix. The talk will present our experimental results on
scoring phenomic characters in images of bat skulls, nematocysts, and leaves, available
in the Morphobank and Bisque data repositories.</p>
        <p>W01-05
Uncertainty analysis and visualization in Large-Scale Vector Fields
Zhang E.</p>
        <p>Abstract: Vector fields are omnipresent in science, engineering, and medicine. Due to
the big-data nature of simulated vector field data, uncertainty in the data can lead to
difficulties in their physical interpretations. In this talk, we will describe how
Morsedecomposition can serve as a means to quantify and control the uncertainty in the data.
Telling a genome's story graphically
Lewis S.</p>
        <p>Abstract: Scientific research is inherently a collaborative task; a dialog among different
researchers to reach a consensus understanding of the underlying biology. Information
graphics facilitate this dialog because humans are visually wired and can absorb
wellexecuted graphics more quickly than they can absorb the written word or columns of
numbers. Visual representations communicate complex ideas quickly and clearly.
Why We Need A Food Ontology
Damion Dooley
Abstract: The need to represent knowledge about food is central to many fields such as
health, food safety, nutrition, food allergy, sustainable development, trade, ecosystems
etc. Academic, national, provincial and departmental databases are all silos of food
terminology and data models. Several resources and standards exist for indexing food
descriptors however their content and architecture are not semantically and logically
coherent. Here we present a unified approach to developing a Farm-to-Fork food
ontology which will facilitate data sharing and interoperability between different health,
regulatory, development and research communities worldwide.
International Efforts in Creating Food Vocabularies
Robert Hoehndorf and Matthew Lange
Abstract: A standardised system for classifying and describing food makes it easier to
compare data from different sources and perform more detailed types of data analyses.
As such, there have been many agency-specific, project-specific and international efforts
to create food vocabularies fit for different purposes. Here we describe the different
existing and ongoing efforts.</p>
        <p>W11-03
Sustainable food systems and food in ecosystems
Pier Luigi Buttigieg
Abstract: This brief talk will outline the need for a global food ontology to flexibly
represent food across human and natural ecosystems. From an anthropocentric point of
view, the sustainability and resilience of the global food system - including the
sustainability of ecosystems and human-made networks which support it - should be
closely interlinked with entities which realise food roles and the global policy objectives
to secure food supply for all. From a more "natural" point of view, a truly global food
ontology should be flexible enough to link taxa (including humans) to their consumers
via the simultaneous realisation of prey, detrital, and food roles. This feature would
provide a semantic basis to model food webs and, in combination with compositional
inventories, nutritional profiles for ecoinformatics. These anthropogenic and natural
perspectives will inevitably converge as a biospheric representation of trophic patterns
emerges, a process which a flexible food ontology can greatly accelerate. Vitally, these
aims will require coordination across multiple established and emerging ontologies to
be feasible in the long term and a number of potential synergies with the Environment
Ontology, the Agronomy Ontology, and the Sustainable Development Goal Interface
Ontology will be proposed.</p>
        <p>W11-04
Food composition and links to human health
Miguel Rodriguez Garcia
Abstract: Data on the composition of foods are essential for a diversity of purposes. It
provides detailed sets of information on the nutritionally important components such as
proteins, carbohydrates, vitamins and minerals. In this talk, we will describe our
approach to employ ontologies to identify food components in recipes described in
natural language. By combining several public data sources, we link the food
components to chemical compounds and their physiological and pathological effects.
Abstract: Globalization of food manufacturing, distribution and consumption require
food microbiology testing programs to perform high-throughput pathogen diagnostics
within a short time frame. Whole-genome sequencing (WGS) can now assemble and
type bacterial genomes in near real-time with improved resolution, making WGS a
viable diagnostic tool during foodborne illness outbreak investigations. However,
genomic data must be combined with epidemiological, clinical, laboratory and other
health care data (contextual data) to be meaningfully interpreted for actionable
interventions. Canada’s Integrated Rapid Infectious Disease Analysis (IRIDA) project
includes the development of a Genomic Epidemiology Application Ontology (GenEpiO),
with the development of standardized food vocabulary as a priority area. Standardized
food descriptors are essential for data sharing between public health agencies and
health responders, accreditation and reproducibility of WGS pipelines, source
attribution and risk assessment.
FoodON Use cases: Caution! Food Allergies Ahead
Emma Griffiths
Abstract: Millions of people worldwide live with food allergies, including all those at risk
for life-threatening anaphylaxis. The lack of a standardized food vocabulary impacts
food source risk assessment, food hazard control, consistent food allergy policy
implementation and food-allergy research. The Canadian Healthy Infant Longitudinal
Study (CHILD) examines causal factors of asthma and allergy during childhood
development. The development of FoodON will benefit food allergy research by
standardizing food descriptors across child cohorts, enable the correlation of food
antigens with biological causation of immune response, and streamline guidelines for
parents.
IC3-Foods: An infrastructure for the next generation internet of food systems, food, and
health.</p>
        <p>Matthew Lange
Abstract: IC3-FOODS, The International Conference/Consortium/Center for Food
Ontology, Operability, Data and Semantics, is a new effort at UC Davis, assembling
ontological and semantic infrastructure components for next generation internet of
food and health. It consists of 3 specific efforts:
i) The International Conference for FOODS assembles stakeholders desiring to integrate
data and informatics systems currently residing along the Environment⇔Ag⇔Food⇔
Diet⇔Health knowledge spectrum, into the:
ii) International Consortium of FOODS, which maintains membership of representative
stakeholders from academia, industry and (non-)governmental organizations to guide
research priorities and development trajectories carried out by:
iii) The International Center for FOODS, whose mission consists of hosting the
IConference--FOODS, administering the I-Consortium-FOODS, and designing,
assembling, and coordinating ontological and infrastructure underpinnings for them.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>