=Paper=
{{Paper
|id=Vol-1327/19
|storemode=property
|title=Making Biomedical Data Usable: NIH Community-Based Data and Metadata Standards Efforts
|pdfUrl=https://ceur-ws.org/Vol-1327/icbo2014_paper_70.pdf
|volume=Vol-1327
|dblpUrl=https://dblp.org/rec/conf/icbo/DearryLBHM14
}}
==Making Biomedical Data Usable: NIH Community-Based Data and Metadata Standards Efforts==
ICBO 2014 Proceedings Making Biomedical Data Usable: NIH Community-Based Data and Metadata Standards Efforts Allen Dearry, Cindy Lawler, Rebecca Boyles, Astrid Haugen, Mike Huerta National Institutes of Health Abstract Making Data Useable Planned 2015 CBS Workshop Make Data Broadly Useable 2013 BD2K Stds Framework Workshop The mission of the NIH Big Data to Knowledge (BD2K) initiative is to enable biomedical scientists to capitalize more fully on the Big Data being Mapping the Landscape of Community Standards Themes: Standards allow data to work with: generated by research communities. With advances in technologies, these investigators are increasingly generating and using large, complex, and • Other data • A Glimpse of Community Standards across the diverse datasets. However, the ability of researchers to locate, analyze, • Software tools Formulating, Conducting and Maintaining biomedical spectrum and use Big Data (and more generally all biomedical and behavioral data) • Data resources Community-Based Standards Efforts • What is a community for the purposes of standards is often limited for reasons related to access to relevant software and tools, expertise, and other factors. BD2K aims to develop the new • How is the need for a particular standards effort development? How do you identify change agents? NIH Standards Information Resource approaches, standards, methods, tools, software, and competencies that • Encourage the adoption of existing, widely used standards identified? • Discuss lessons learned from similar community will enhance the use of biomedical Big Data by supporting research, • Discourage unnecessary duplication of effort / reinventing • What is the process used to assess and prioritize standards efforts. Pain points, and obstacles of implementation, and training in data science and other relevant fields. wheel selected activities? efforts that either succeeded or failed. One initiative within BD2K is to establish community-driven frameworks for • How do participants contribute to the standards developing and using standards for data and metadata. Such standards Support community-based standards efforts • Identification of data standards for potential support. effort? enable broad data sharing and reuse of data generated across the full • Standards are used when community wants & supports them What kinds of characteristics should be considered spectrum of NIH-relevant research, from single investigators conducting • What are the characteristics of the ongoing • BD2K will develop routine ways to provide time limited for a need? R01-driven research to large collaborative networks and consortia. discussions/meetings? Standards for the metadata that describe the samples and experiments support for particularly opportune community-based • Are milestones or similar indicators of progress used, • End user engagement: Implementation, adoption, associated with the data, in addition to standards for each of the data standards efforts communication, feedback over the lifecycle types themselves, would greatly facilitate (and are probably even required and if so, how? for) large-scale data sharing and data integration. NIH should help • How is the product of the standards effort updated • What kinds of targeted support and assistance could establish flexible frameworks for developing data and metadata standards for newly emerging data types that are expected to be used widely, Purpose and assessed? accelerate the development and adoption of high thereby encouraging various biomedical research communities to develop Community-Based Data and Metadata Standards quality data and metadata standards for NIH relevant such standards in coordinated ways. Priorities for standardization should The purpose of this initiative is to accomplish three main goals: research? be community-driven. Standards should be applicable to both research 1) establish an internal NIH framework of policies, governance, and clinical data, where appropriate. It will be necessary to address a range of issues, including developing common data formats and data administrative procedures, and funding to routinely support elements for particular types of studies and linking established care community-based standards efforts; standards to meaningful use standards for electronic health records 2) use that framework to provide catalytic extramural research RFIs for Community Input (EHRs), to the extent possible. support for particularly opportune efforts under BD2K, that are Information resources for data-related standards broadly relevant to NIH research; and • Collect, organize, and make available trusted, This poster describes the process that NIH is initiating to guide the support 3) integrate the framework for standards development into and development of community-based standards. other BD2K activities to identify and capitalize on potential systematically organized, and curated information Future Directions synergisms. The framework for standards development will about data-related standards include catalytic support, in the form of time-limited financial assistance, for convening, organizing, and logistics toward Community-based standards development For more on BD2K: Background facilitating a community of practice that addresses well- • Activities that could advance community-based http://bd2k.nih.gov/about_bd2k.html#sthash.qfVYTOK5. formulated standards-related needs that may include creation standards landscape (e.g., creating a collaborative dpbs or extension of a standard. workspace or an advising structure toward standards Big Data to Knowledge (BD2K): Overview development, extension, or adoption). • Gaps in community-based data standards of For Community-based standards development: Overarching goal: Planning Activities relevance to NIH research, including real use-cases • RFI, fall 2014 (e.g., emerging fields, research domains with multiple By the end of this decade, enable a quantum leap • Workshop, spring 2015 in the ability of the biomedical research enterprise existing data standards that could benefit from to maximize the value of the growing volume and additional work, integration and/or reconciliation). • Follow up? dearry@niehs.nih.gov complexity of biomedical data 2013 NIEHS/EPA Language Workshop • Lessons learned from existing field-tested processes Purpose: and infrastructure. • Learn about standard language efforts in the field of • Common challenges/pain points in development environmental health science. (e.g., methods for community engagement or • Discuss the way forward for environmental health building interoperability with other related standards). sciences terminology. • Develop a local community of standard language expertise within the environmental health sciences. 2014 NIEHS/EPA Vocabulary Workshop Findings: • Active data stewardship/curation adds value and is Purpose: needed at some level; but we have no funding model • Establish a collaborative and cross-disciplinary The BD2K initiative addresses four major aims that, in to support data stewards and no way to measure the group to inform development of environmental combination, are meant to enhance the utility of biomedical value of their contributions compared to, e.g., new health science language standards and applications Big Data: research grants. that will aid data sharing, integration and analysis, • To facilitate broad use of biomedical digital assets by making • Sociological barriers to data sharing (need for them discoverable, accessible, and citable. “culture change”) within and across communities are Considerations: as serious as technological barriers. • Inventory existing resources • To conduct research and develop the methods, software, and • Many community-driven and community-developed • Propose use cases tools needed to analyze biomedical Big Data. standards already exist and more are being • Assess current semantic landscape developed; these different solutions are just starting • Critical components of a common language • To enhance training in the development and use of methods to meet at the interfaces between research framework and tools necessary for biomedical Big Data science. disciplines. • Lessons from successful standards development • To support a data ecosystem that accelerates discovery as • Incentives, sustainability part of a digital enterprise. 73