=Paper= {{Paper |id=Vol-1747/IP07_ICBO2016 |storemode=property |title=The Cell Line Ontology Integration and Analysis of the Knowledge of LINCS Cell Lines |pdfUrl=https://ceur-ws.org/Vol-1747/IP07_ICBO2016.pdf |volume=Vol-1747 |authors=Edison Ong,Jiangan Xie,Zhaohui Ni,Qingping Liu,Yu Lin,Vasileios Stathias,Caty Chung,Stephan Schurer,Yongqun He |dblpUrl=https://dblp.org/rec/conf/icbo/OngXNLLSCSH16 }} ==The Cell Line Ontology Integration and Analysis of the Knowledge of LINCS Cell Lines == https://ceur-ws.org/Vol-1747/IP07_ICBO2016.pdf
     The Cell Line Ontology integration and analysis of
            the knowledge of LINCS cell lines
 Edison Ong1, Jiangan Xie1, Zhaohui Ni1, Qingping Liu1, Yu Lin2, Vasileios Stathias2, Caty Chung2,
                                Stephan Schurer*2, Yongqun He*1
                 1
                     University of Michigan, Ann Arbor, Michigan, USA; 2 University of Miami, Coral Gables, FL, USA


    Abstract— Cell lines are crucial to study molecular signatures
and pathways, and are widely used in the NIH Common Fund                B. CLO modeling nd design pattern generation
LINCS project. The Cell Line Ontology (CLO) is a community-                 Based on the data types obtained from the mapping
based ontology representing and classifying cell lines from             process, an updated CLO design pattern model was generated
different resources. To better serve the LINCS research                 in order to accommodate new LINCS cell line data attributes.
community, from the LINCS Data Portal and ChEMBL, we
identified 1,097 LINCS cell lines, among which 717 cell lines were      C. New information incorporation into CLO
associated with 121 cancer types, and 352 cell line terms did not           Based on the new design patterns, Ontorat
exist in CLO. To harmonize LINCS cell line representation and
                                                                        (http://ontorat.hegroup.org) was used to incorporate LINCS
CLO, CLO design patterns were slightly updated to add new
information of the LINCS cell lines including different database        cell line data from different data sources to CLO. Manual
cross-reference IDs. A new shortcut relation was generated to           checking was performed to ensure correctness.
directly link a cell line to the disease of the patient from whom       D. Generation and analysis of a LINCS cell line set of CLO
the cell line was originated. After new LINCS cell lines and
related information were added to CLO, a CLO subset/view                    OntoFox (http://ontofox.hegroup.org) was used to generate
(LINCS-CLOview) of LINCS cell lines was generated and                   a CLO subset (LINCS-CLOview) that includes all LINCS cell
analyzed to identify scientific insights into these LINCS cell lines.   lines, as shown here: https://raw.githubusercontent.com/CLO-
This study provides a first time use case on how CLO can be             ontology/CLO/master/src/ontology/LINCS-CLOview.owl. The
updated and applied to support cell line research from a specific       LINCS CLO subset was also submitted to Ontobee
research community or project initiative.                               (http://www.ontobee.org). The information of the subset was
                                                                        visualized using Protégé OWL editor, queried using Ontobee
    Keywords— Cell line, cell, ontology, CLO, LINCS, ChEMBL             SPARQL web program, and further analyzed.

                           I.         INTRODUCTION                                                      III.       RESULTS
    The NIH Common Fund Library of Integrated Network-
based Cellular Signatures (LINCS) program aims to create a              A. LINCS cell line information extraction and mapping from
network-based biological understanding of gene expression                   different resources
and cellular processes when cells are exposed to various                    As of April 15, 2016, 1,097 cell lines were extracted from
perturbing agents (http://www.lincsproject.org/). Over 1000             the LINCS Data Portal. Among these LINCS cell lines, 794
cell lines have been used in LINCS and play a critical role as          cell lines could be directly mapped to CLO. Meanwhile,
disease model systems to produce molecular and cellular                 ChEMBL included 637 cell line entries with LINCS IDs.
signatures and networks.                                                Among these, 451 cell lines have CLO_IDs, and 51 out of the
    The Cell Line Ontology (CLO) is a community-based                   remaining 186 cell lines could be mapped to CLO using name
ontology system for representing cell lines [1]. The overall            matching. The data types available related to these cell lines in
goal of this study is to use CLO to represent and integrate the         the LINCS Portal and ChEMBL are shown in Fig. 1.
knowledge of LINCS cell lines in order to power LINCS cell                                                           - cell ID
lines’ integrity across multiple resources.                                             - cell line (CL) name        - cell line name
                                                                                                                     - cell description
                                                                                        - CL LINCS ID
                                                                                        - CL alternate name          - cell source tissue
                                                                                        - CL provider name           - cell source organism
                                II.     METHODS                                         - CL provider catalog ID     - cell source tax ID
                                                                                        - CL organ                   - CLO ID
A. Information extraction and data mapping                                              - CL organism                - EFO ID
                                                                                        - CL disease                 - cellosaurus ID
    Two sources, including the LINCS Data Portal                                        - CL disease DO ID           - CL LINCS ID
                                                                                                                     - ChEMBL ID
(http://lincsportal.ccs.miami.edu/entities/) and ChEMBL [2],                              (A) LINCS data portal
                                                                                                                     (B) ChEBML data types
were used to obtain LINCS cell line information. The data in                                    data types
these two sources were compared and mapped to the CLO
                                                                        Fig. 1. Cell line-related data types of the data downloaded from LINCS Data
knowledge base, and new information was identified.                     Portal and ChEMBL. (A) Data types from LINCS Data Portal. (B) Data types
                                                                        from ChEMBL. Red-highlighted items (e.g., ChEMBL ID) were not covered
                                                                        in CLO, which were added later to CLO in this study.
    *: corresponding authors: stephan.schurer@gmail.com; and
yongqunh@med.umich.edu
   Among the total of 1097 LINCS cell lines each with a                                            MCF 10A and MCF 10F cells). The other 118 diseases are
unique LINCS cell line ID (e.g., LCL-1512 for HeLa cell), 466                                      various types of cancers. The hierarchical structure of these
have ChEMBL, LINCS, and CLO IDs, 279 have LINCS and                                                diseases under the Disease Ontology (DOID) also helped the
CLO IDs, and 352 LINCS cell lines do not have any CLO IDs.                                         understanding of all the diseases associated with LINCS cell
                                                                                                   lines. For example, 19 LINCS cell lines (e.g., HeLa cell) were
B. CLO modeling and design pattern generation                                                      derived from patients with cervical adenocarcinoma, 4 with
    To represent the new database information to a specific cell                                   cervical clear cell adenocarcinoma (a specific type of cervical
line (Fig. 1), we used ‘seeAlso’ relation. For example, for the                                    adenocarcinoma), and 14 with cervical squamous cell
HeLa cell (CLO_0003684), we added to CLO: ‘Cell line                                               carcinoma. These diseases all belong to cervix carcinoma.
LINCS ID: LCL-1512’ and ‘seeAlso: EFO: EFO_0001185;                                                    We also examined the tissue and organ types from which
CHEMBL: CHEMBL3308376; CVCL: CVCL_0030’.                                                           the LINCS cell lines were derived. In CLO, the multi-species
    To more conveniently link a specific cell line and a disease,                                  anatomy ontology UBERON is used to represent tissues and
we have also generated a new shortcut relation ‘derived                                            organs. In total 131 UBERON terms have been used in
originally from patient with disease’ (Fig. 2).                                                    LINCS-CLOview to refer to various anatomic locations from
                                                                                                   which LINCS cell lines were derived.
                                          cell line cell from an
           cell line cell      is a       anatomical structure                                         The cell types of LINCS cell lines were analyzed. The Cell
                                               in organism       derived from     cell from an     Type Ontology (CL) was used in CLO to demonstrate the cell
    shortcut: derived originally                                                  anatomical
     from patient with disease
                                                                                                   types of different cell lines. In total, 43 CL cell types, such as
                                                                                  structure in
                                                                  has part         organism        epithelial cell, B cell, and T cell, are included in LINCS-
                                             organism having
             disease        has disease
                                                a disease                                          CLOview. Each of these cell types is linked to different cell
                                                                                                   line cells. For a project to study cellular signatures related to a
                                                   (A)
                                                                                                   specific cell type, the LINCS-CLOview provides a feasible
                                        immortal human uterine                                     method to identify which cell line cells to use.
             HeLa cell         is a      cervix-derived epithelial
                                                cell line cell     derived
    shortcut: derived originally                                    from        human uterine                              IV.    DISCUSSION
     from patient with disease                                                  cervix-derived
                                                                  has part       epithelial cell       This article is the first report of developing a CLO
                                             human patient
             cervical           has
                                              with cervical
                                                                                                   community view to serve a specific community, in this case,
         adenocarcinoma       disease
                                                                                                   the LINCS research community. Since tens of thousands of cell
                                            adenocarcinoma
                                                   (B)                                             lines have been represented in CLO, it is inefficient to use the
                                                                                                   whole CLO for LINCS cell line related research. The
Fig. 2. CLO design pattern model for using the new shortcut relation ‘derives                      generation of LINCS-CLOview allows standardization and
originally from patient having disease’. (A) General design pattern; (B) an
example to illustrate the design pattern. The shortcut relation makes it more                      modularization of the LINCS cell lines, which facilitates the
efficient to represent the relation between a cell line cell and a disease when the                better analysis and reuse of the LINCS cell line information.
parent term of the cell line cell includes sufficient information about the cell
type and tissue/organ. In this illustration, the classes as shown in the dotted
boxes are redundant and are not needed.                                                                                  ACKNOWLEDGMENT
                                                                                                       This work was supported by grant U54HL127624 (BD2K
C. New data integration to CLO and CLO subset generation                                           LINCS Data Coordination and Integration Center, DCIC)
   Based on the mapping and the design pattern models (Fig.                                        awarded by the National Heart, Lung, and Blood Institute
1 and 2), extra data available in the LINCS Data Portal and                                        through funds provided by the trans-NIH LINCS Program and
ChEMBL were integrated into to CLO.                                                                the trans-NIH Big Data to Knowledge (BD2K) initiative
                                                                                                   (http://www.bd2k.nih.gov). LINCS is an NIH Common Fund
    A CLO subset of LINCS cell lines (LINCS-CLOview) was                                           projects. This project was also supported by a BD2K-LINCS
also generated. LINCS-CLOview can be considered as a CLO                                           DCIC external data science research award.
“community view” [3] for the LINCS research community. As
of May 1, 2016, LINCS-CLOview contained 1,924 terms,
                                                                                                                              REFERENCES
including 1,825 classes, 25 object properties, 61 annotation
properties, and 13 instances. These terms include 1,315 terms                                      [1] S. Sarntivijai, Y. Lin, Z. Xiang, T. F. Meehan, A. D. Diehl, U.
with CLO IDs. The other terms were imported from 17 other                                              D. Vempati, et al., "CLO: The Cell Line Ontology," J Biomed
ontologies. Detailed statistics of LINCS-CLOview is shown:                                             Semantics, vol. 5, p. 37, 2014.
http://www.ontobee.org/ontostat/LINCS-CLOview.                                                     [2] A. Gaulton, L. J. Bellis, A. P. Bento, J. Chambers, M. Davies,
                                                                                                       A. Hersey, et al., "ChEMBL: a large-scale bioactivity database
                                                                                                       for drug discovery," Nucleic Acids Res, vol. 40, pp. D1100-7,
D. Analysis of LINCS cell lines by querying LINCS-CLOview                                              Jan 2012.
   With the availability of LINCS-CLOview, we were able to                                         [3] J. Zheng, Z. Xiang, C. J. Stoeckert, Jr., and Y. He, "Ontodog: a
analyze LINCS cell lines from different aspects.                                                       web-based ontology community view generation tool,"
                                                                                                       Bioinformatics, vol. 30, pp. 1340-2, Feb 1 2014.
   Our study found that LINCS cell lines are associated with
121 diseases. These 121 diseases include three benign
neoplasms, i.e., breast fibrocystic disease (associated with