=Paper= {{Paper |id=Vol-1327/19 |storemode=property |title=Making Biomedical Data Usable: NIH Community-Based Data and Metadata Standards Efforts |pdfUrl=https://ceur-ws.org/Vol-1327/icbo2014_paper_70.pdf |volume=Vol-1327 |dblpUrl=https://dblp.org/rec/conf/icbo/DearryLBHM14 }} ==Making Biomedical Data Usable: NIH Community-Based Data and Metadata Standards Efforts== https://ceur-ws.org/Vol-1327/icbo2014_paper_70.pdf
                                                                                                                                                        ICBO 2014 Proceedings




            Making Biomedical Data Usable: NIH Community-Based Data and Metadata Standards Efforts
                       Allen Dearry, Cindy Lawler, Rebecca Boyles, Astrid Haugen, Mike Huerta
                                             National Institutes of Health
   Abstract                                                                         Making Data Useable                                                                                                                                      Planned 2015 CBS Workshop
                                                                                    Make Data Broadly Useable
                                                                                                                                                                                2013 BD2K Stds Framework Workshop
   The mission of the NIH Big Data to Knowledge (BD2K) initiative is to
   enable biomedical scientists to capitalize more fully on the Big Data being                                                                                                  Mapping the Landscape of Community Standards                 Themes:
                                                                                    Standards allow data to work with:
   generated by research communities. With advances in technologies, these
   investigators are increasingly generating and using large, complex, and          • Other data                                                                                                                                             • A Glimpse of Community Standards across the
   diverse datasets. However, the ability of researchers to locate, analyze,        • Software tools                                                                            Formulating, Conducting and Maintaining                        biomedical spectrum
   and use Big Data (and more generally all biomedical and behavioral data)         • Data resources                                                                            Community-Based Standards Efforts                            • What is a community for the purposes of standards
   is often limited for reasons related to access to relevant software and
   tools, expertise, and other factors. BD2K aims to develop the new                                                                                                            • How is the need for a particular standards effort            development? How do you identify change agents?
                                                                                    NIH Standards Information Resource
   approaches, standards, methods, tools, software, and competencies that
                                                                                    • Encourage the adoption of existing, widely used standards
                                                                                                                                                                                  identified?                                                • Discuss lessons learned from similar community
   will enhance the use of biomedical Big Data by supporting research,
                                                                                    • Discourage unnecessary duplication of effort / reinventing                                • What is the process used to assess and prioritize            standards efforts. Pain points, and obstacles of
   implementation, and training in data science and other relevant fields.
                                                                                      wheel                                                                                       selected activities?                                         efforts that either succeeded or failed.
   One initiative within BD2K is to establish community-driven frameworks for                                                                                                   • How do participants contribute to the standards
   developing and using standards for data and metadata. Such standards             Support community-based standards efforts
                                                                                                                                                                                                                                             • Identification of data standards for potential support.
                                                                                                                                                                                  effort?
   enable broad data sharing and reuse of data generated across the full            • Standards are used when community wants & supports them                                                                                                  What kinds of characteristics should be considered
   spectrum of NIH-relevant research, from single investigators conducting                                                                                                      • What are the characteristics of the ongoing
                                                                                    • BD2K will develop routine ways to provide time limited                                                                                                   for a need?
   R01-driven research to large collaborative networks and consortia.                                                                                                             discussions/meetings?
   Standards for the metadata that describe the samples and experiments               support for particularly opportune community-based
                                                                                                                                                                                • Are milestones or similar indicators of progress used,     • End user engagement: Implementation, adoption,
   associated with the data, in addition to standards for each of the data            standards efforts                                                                                                                                        communication, feedback over the lifecycle
   types themselves, would greatly facilitate (and are probably even required                                                                                                     and if so, how?
   for) large-scale data sharing and data integration. NIH should help                                                                                                          • How is the product of the standards effort updated         • What kinds of targeted support and assistance could
   establish flexible frameworks for developing data and metadata standards
   for newly emerging data types that are expected to be used widely,
                                                                                    Purpose                                                                                       and assessed?                                                accelerate the development and adoption of high
   thereby encouraging various biomedical research communities to develop           Community-Based Data and Metadata Standards
                                                                                                                                                                                                                                               quality data and metadata standards for NIH relevant
   such standards in coordinated ways. Priorities for standardization should        The purpose of this initiative is to accomplish three main goals:                                                                                          research?
   be community-driven. Standards should be applicable to both research             1) establish an internal NIH framework of policies, governance,
   and clinical data, where appropriate. It will be necessary to address a
   range of issues, including developing common data formats and data
                                                                                    administrative procedures, and funding to routinely support
   elements for particular types of studies and linking established care            community-based standards efforts;
   standards to meaningful use standards for electronic health records              2) use that framework to provide catalytic extramural research                              RFIs for Community Input
   (EHRs), to the extent possible.                                                  support for particularly opportune efforts under BD2K, that are                             Information resources for data-related standards
                                                                                    broadly relevant to NIH research; and                                                       • Collect, organize, and make available trusted,
   This poster describes the process that NIH is initiating to guide the support    3) integrate the framework for standards development into
   and development of community-based standards.
                                                                                    other BD2K activities to identify and capitalize on potential
                                                                                                                                                                                   systematically organized, and curated information         Future Directions
                                                                                    synergisms. The framework for standards development will                                       about data-related standards
                                                                                    include catalytic support, in the form of time-limited financial
                                                                                    assistance, for convening, organizing, and logistics toward                                 Community-based standards development                        For more on BD2K:
Background                                                                          facilitating a community of practice that addresses well-                                   • Activities that could advance community-based              http://bd2k.nih.gov/about_bd2k.html#sthash.qfVYTOK5.
                                                                                    formulated standards-related needs that may include creation                                  standards landscape (e.g., creating a collaborative        dpbs
                                                                                    or extension of a standard.                                                                   workspace or an advising structure toward standards
            Big Data to Knowledge (BD2K): Overview                                                                                                                                development, extension, or adoption).
                                                                                                                                                                                • Gaps in community-based data standards of                  For Community-based standards development:
   Overarching goal:
                                                                                   Planning Activities                                                                            relevance to NIH research, including real use-cases        • RFI, fall 2014
                                                                                                                                                                                  (e.g., emerging fields, research domains with multiple
           By the end of this decade, enable a quantum leap                                                                                                                                                                                  • Workshop, spring 2015
           in the ability of the biomedical research enterprise                                                                                                                   existing data standards that could benefit from
           to maximize the value of the growing volume and                                                                                                                        additional work, integration and/or reconciliation).       • Follow up? dearry@niehs.nih.gov
           complexity of biomedical data
                                                                                   2013 NIEHS/EPA Language Workshop                                                             • Lessons learned from existing field-tested processes
                                                                                   Purpose:                                                                                       and infrastructure.
                                                                                   • Learn about standard language efforts in the field of                                      • Common challenges/pain points in development
                                                                                     environmental health science.                                                                (e.g., methods for community engagement or
                                                                                   • Discuss the way forward for environmental health                                             building interoperability with other related standards).
                                                                                     sciences terminology.
                                                                                   • Develop a local community of standard language
                                                                                     expertise within the environmental health sciences.                                        2014 NIEHS/EPA Vocabulary Workshop
                                                                                   Findings:
                                                                                   • Active data stewardship/curation adds value and is                                         Purpose:
                                                                                     needed at some level; but we have no funding model                                         • Establish a collaborative and cross-disciplinary
    The BD2K initiative addresses four major aims that, in                           to support data stewards and no way to measure the                                           group to inform development of environmental
    combination, are meant to enhance the utility of biomedical                      value of their contributions compared to, e.g., new                                          health science language standards and applications
    Big Data:
                                                                                     research grants.                                                                             that will aid data sharing, integration and analysis,
   • To facilitate broad use of biomedical digital assets by making                • Sociological barriers to data sharing (need for
     them discoverable, accessible, and citable.                                     “culture change”) within and across communities are                                        Considerations:
                                                                                     as serious as technological barriers.                                                      • Inventory existing resources
   • To conduct research and develop the methods, software, and                    • Many community-driven and community-developed                                              • Propose use cases
     tools needed to analyze biomedical Big Data.                                    standards already exist and more are being                                                 • Assess current semantic landscape
                                                                                     developed; these different solutions are just starting                                     • Critical components of a common language
   • To enhance training in the development and use of methods
                                                                                     to meet at the interfaces between research                                                   framework
     and tools necessary for biomedical Big Data science.
                                                                                     disciplines.                                                                               • Lessons from successful standards development
   • To support a data ecosystem that accelerates discovery as                                                                                                                  • Incentives, sustainability
     part of a digital enterprise.


                                                                                                                                                                 73