<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Johnson);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paulo Pinheiro</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Henrique Santos</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>James Masters</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthew Johnson</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeanette A. Stingone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sofia</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bengoa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcello Bax</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Deborah L. McGuinness</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Columbia University</institution>
          ,
          <addr-line>New York, NY</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Icahn School of Medicine at Mount Sinai</institution>
          ,
          <addr-line>New York, NY</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Instituto Piaget</institution>
          ,
          <addr-line>Almada</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Rensselaer Polytechnic Institute</institution>
          ,
          <addr-line>Troy, NY</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Universidade Federal de Minas Gerais</institution>
          ,
          <addr-line>Belo Horizonte, MG</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Most scientific data repositories have minimal capabilities for integrating data within studies and even less for supporting data harmonization across multiple studies. To prepare data for publication or analysis, it must be organized, normalized, and harmonized to allow the production of high-quality datasets for dissemination and reuse. The Findable, Accessible, Interoperable, Reusable (FAIR) principles have proven to be a key benchmark for scientific data, laying out the foundations to support a more straightforward way to accomplish these integration challenges in hybrid settings. The Human-Aware Data Acquisition Infrastructure (HADatAc) provides data repository software that uses FAIR principles to build and expose comprehensive knowledge, referred to as scientific knowledge graphs (SKG), using scientific data, data dictionaries, and study documentation. HADatAc employs metadata templates to capture the semantics of studies and systematically represents the scientific knowledge as RDF triples by annotating data points with community-built ontologies, providing users with features such as data browsing, faceted search, data summarization, and dataset generation. HADatAc has been used extensively in several National Institutes of Health and IBM-funded eforts, as well as across higher education institutions in the United States, Brazil, Portugal, and Canada, to support scientific data management and sharing.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Scientific data integration and management pose challenges for scientists due to the large
amount and diversity of data. New techniques and high-speed data generation tools have
sparked an information revolution in scientific data management [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Scientists often face
limitations when contributing to or retrieving data from repositories, and may need to adjust
their approach based on what the repositories can ofer [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For example, a repository may
allow a scientist to select Alzheimer’s studies from a collection of mental-related studies, but
may not provide the option to restrict data download to specific variables.
      </p>
      <sec id="sec-2-1">
        <title>Existing semantic technologies, especially knowledge graphs (KGs) [3], are well-suited to</title>
        <p>
          bridge the gap between scientists’ data requirements and data repositories’ data operation
capabilities. They can enable scientists to rely less on ad-hoc extract, transform, and load (ETL)
tools/scripts and more on a data repository’s semantic capability to acquire data tailored to
their needs. The employment of such technologies has been focused on dataset annotation [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ],
with few approaches partially covering studies’ and data semantics [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>The Human-Aware Data Acquisition Infrastructure (HADatAc) is a semantic data repository
for managing scientific data acquired through multiple sources including instruments, sensors,
humans, and computer models. HADatAc builds a scientific knowledge graph (SKG) from
studies’ metadata, measurement data in datasets, and data dictionaries. HADatAc’s contextual
knowledge describes why data was acquired, how they came to be, and what are the many
decisions that may have afected data quality during their acquisition. Contextual knowledge
includes descriptions of study properties to support scientific activities, rationales for building
sensing capabilities to support observations and experiments, and characterization of entities
along with their quantities and qualities used to annotate acquired data.</p>
      </sec>
      <sec id="sec-2-2">
        <title>We demonstrate HADatAc in the context of the National Institutes of Environmental Health</title>
      </sec>
      <sec id="sec-2-3">
        <title>Sciences (NIEHS)-funded Human Health Exposure Analysis Resource (HHEAR) program, in</title>
        <p>which it uses contextual knowledge to improve the way data are further analyzed through
the use of analytics and visualization solutions. Further, we provide a brief assessment of</p>
      </sec>
      <sec id="sec-2-4">
        <title>HADatAc’s adherence to the Findable, Accessible, Interoperable, Reusable (FAIR) [6] principles.</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. The Human-Aware Data Acquisition Infrastructure</title>
      <sec id="sec-3-1">
        <title>HADatAc is a semantic scientific data repository based on the Resource Description Framework</title>
        <p>(RDF) that uses a web application to enable users to access and use its underlying SKG. Figure 1
shows a high-level view of the infrastructure’s architecture. The front-end is composed of six
web components (Section 2.1). The back-end (Section 2.2) is decomposed into core components
that are responsible for storing the underlying SKG, a content ingestion component that is
responsible for annotating and moving content from data sources into the SKG, and the HAScO</p>
      </sec>
      <sec id="sec-3-2">
        <title>API is responsible for providing a standardized way of manipulating the SKG. The top of the</title>
        <p>ifgure shows the types of instrument-provided content fed into HADatAc used to build the</p>
      </sec>
      <sec id="sec-3-3">
        <title>SKG, such as data files and data streams; supporting ontologies providing community-curated knowledge (Section 2.3); and scientist-generated metadata templates (Section 2.4).</title>
        <sec id="sec-3-3-1">
          <title>2.1. HADatAc Web Interface</title>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>HADatAc’s front end allows users to interact with the SKG and provides user interface (UI)</title>
        <p>capabilities to build, use, and share data. The UI can be described in terms of six subsystems.</p>
      </sec>
      <sec id="sec-3-5">
        <title>Three of these subsystems provide scientists with ways to search, find, select, and retrieve</title>
        <p>content from the SKG. The Data Value Faceted Search and the Study Faceted Search subsystems
provide ways for scientists to find and select content from the SKG. The Dataset Generation
subsystem provides the capability to build normalized data from faceted search selections.</p>
      </sec>
      <sec id="sec-3-6">
        <title>The Data Source Management subsystem is the main way for adding content into HADatAc:</title>
        <p>this subsystem is responsible for feeding content into the Content ingestion component and
Ontologies
(HAScO, SIO,
PROV, VSTOI,
Chebi, UO etc)</p>
        <p>Metadata
Templates
(SDDs, SSDs,</p>
        <p>DPL, ect)</p>
        <p>Data Sources
(data files, data
streams, etc)
Individual
instrument
HADatAc Architecture</p>
        <p>Front End
DFaSateacaeVrtcaehldue FSSaectauerdtceyhd
Back End</p>
        <p>Content Ingestion</p>
        <p>Data
Ingestion</p>
        <p>Metadata
Ingestion</p>
        <p>Dataset
Generation</p>
        <p>SDD
Editor</p>
        <p>Data Source
Management</p>
        <p>Instrument
Management</p>
        <p>Study</p>
        <p>
          Management
showing users whether content ingestion occurred successfully or not. In case of failure, a
log is created for each data source to identify ingestion problems. The Study Management
subsystem allows scientists to inspect cohort information and verify that study specifications
are ingested correctly. Similarly, the Instrument Management Subsystem allows scientists to
verify if instruments and supporting detectors are properly configured, if they are deployed
or not, and on which platforms they are deployed. This is a key way of verifying if standard
operating procedures are being properly followed when using instruments to collect scientific
data. The SDD-Editor [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is a feature that mimics the look and feel of a spreadsheet editor and
loads in a Semantic Data Dictionary (SDD) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] mapping file, enabling users to create or modify
mappings between study data and ontology classes [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <sec id="sec-3-6-1">
          <title>2.2. HAScO API, Core Components and Content Ingestion</title>
        </sec>
      </sec>
      <sec id="sec-3-7">
        <title>The HAScO (Human-Aware Science Ontology) [10] API supports the storage of large volumes</title>
        <p>
          of scientific data and the comprehensive description of entities that compose a scientific study.
The infrastructure provides core features in support of scientific data repositories, including
a hybrid storage approach: a search engine/NoSQL database (Apache SOLR [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]) and an RDF
triple-store (Apache Jena Fuseki [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]). The SKG created by HADatAc uses the RDF model
and persists over this hybrid approach, relying on scalable NoSQL to store data and flexible
triple stores to store metadata. Since all content follows the RDF model, the elements in both
repositories are logically connected through the use of object properties. The union of these
elements constitutes the overall knowledge graph.
        </p>
        <sec id="sec-3-7-1">
          <title>2.3. Foundational Ontologies</title>
        </sec>
      </sec>
      <sec id="sec-3-8">
        <title>For encoding knowledge about scientific studies, HADatAc’s semantic data ingestion leverages</title>
        <p>
          several science ontologies including the Human-Aware Science Ontology (HAScO) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ],
Semanticscience Integrated Ontology (SIO) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], and Human Health Exposure Analysis Resource
(HHEAR) [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. HAScO is used for encoding knowledge about studies, study types, and data
elicitation from human subjects. SIO is used for encoding knowledge about entities and their
properties.
        </p>
      </sec>
      <sec id="sec-3-9">
        <title>Scientific ontologies, like SIO (and others), tend to be domain-agnostic and used with the</title>
        <p>exclusive purpose of describing the data. They don’t intend to support the process of the
data throughout the scientific life cycle but are dependent on some data organizational needs
and circumstances specified during the study design phase. HAScO supports scientific data
organization by defining the notion of collections (SampleCollection and TimeCollection) and
groups (SubjectGroup) of objects of interest and their relations.</p>
      </sec>
      <sec id="sec-3-10">
        <title>Other ontologies generally used are W3C PROV Ontology [15] for encoding provenance knowledge and the Virtual Solar-Terrestrial Observatory (VSTO) [16] for encoding knowledge about instruments and platforms.</title>
        <sec id="sec-3-10-1">
          <title>2.4. Metadata Templates for Knowledge Capture</title>
          <p>HADatAc captures scientific knowledge through the use of metadata templates (MTs). MTs
provide a framework for domain experts to identify and define the semantics of the study
elements, including study metadata, study object collections (e.g., cohorts), roles that object
collections play in studies (e.g., subjects, samples), object properties, and relationships among
object collections. MTs are encoded in a tabular format, and when interpreted by HADatAc,
each row will be translated to RDF resources in the SKG according to the purpose of the MT
and the values in such row. Within each MT table, the column hasUri is used to inform the URI
a row will be mapped to in the knowledge graph.</p>
          <p>Study (STD) specification : Studies are where data acquisition activities designed by humans,
i.e., scientists and engineers, are planned and executed. These activities acquire data that,
once analyzed, should be able to answer scientific questions. STD specifications are used to
capture and preserve knowledge from humans regarding their aims, scientific questions, and
data acquisition activities. One important property of an STD is the nature (or type) of the
study, which can be an observation, an empirical experiment, a computational experiment, or a
combination of those.</p>
        </sec>
      </sec>
      <sec id="sec-3-11">
        <title>Deployment (DPL) specification is used to comprehensively describe the measurement in</title>
        <p>frastructure of a study. A DPL has several tables to capture metadata about the data acquisition
infrastructure of the study, including instruments, detectors attached to instruments, and
platforms where instruments are deployed. In addition, the DPL allows scientists to define
deployments to state the combinations of the aforementioned elements in which data has been
acquired.</p>
      </sec>
      <sec id="sec-3-12">
        <title>Semantic Study Design (SSD) describes study objects known at the time studies are designed.</title>
      </sec>
      <sec id="sec-3-13">
        <title>An SSD describes a scientific study in terms of its objects and object collections. Data in a study are values of an object’s properties. In order to properly organize study data we need to be able to describe the study in terms of its objects.</title>
      </sec>
      <sec id="sec-3-14">
        <title>Semantic Data Dictionary (SDD) describes the meaning of values in terms of objects and</title>
        <p>their properties. SDDs are composed of objects and attributes. Attributes are used to specify
object properties including relationships among objects. In terms of spatial knowledge, objects
can be used to represent locations. For temporal knowledge, objects can be used to represent
events and time instants.</p>
      </sec>
      <sec id="sec-3-15">
        <title>Stream specification (STR) identifies if data are acquired as a stream of data files or messages.</title>
      </sec>
      <sec id="sec-3-16">
        <title>A data file stream can be composed of a single file. In addition to specifying the source of the data, streams also identify data ownership, data privacy, the deployed instrument used to collect the data, and, more importantly, from a data acquisition point of view, which SDD is used to ingest the data from files and messages into the KG.</title>
      </sec>
      <sec id="sec-3-17">
        <title>When MTs and data files are ingested, HADatAc constructs an SKG with all the metadata,</title>
        <p>as seen in Figure 2. The SKG has five larger areas of knowledge representation: (A) Scientific
activities as specified in STD, DPL, and STR MTs; (B) Instruments as specified in DPL MT;
(C) Data schemas as specified in SSD MT; (D) Object collections as specified in SSD MT; (E)</p>
      </sec>
      <sec id="sec-3-18">
        <title>Semantically-annotated data repository.</title>
        <p>(A)
hasco:</p>
        <p>Study
(B) vstoi:
hasInstrument
vstoi:
instrument
(D)
hasco:
isDataAcquisitionOf
hDaastcao:Acquisition hhaasscSoc:hema
hasco:
hasDeployment
vstoi:
Deployment
vstoi:
hasPlatform
vstoi:
Platform
hasco:
hasDetector
vstoi:
Detector
(C)
hasco:
Data Acquisition
Schema</p>
        <p>hpaarstcOof:Schema hhaasscEon:try
hasco: Data hasco:
Acquisition Schema hasAttribute
Attribute
hasco:
hasUnit
vstoi:
Questionnaire
vstoi:
PhysicalInstrument
vstoi:
Model
hasco:
isMemberOf</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. HHEAR Data Center Use Case</title>
      <sec id="sec-4-1">
        <title>The Human Health Exposure Analysis Resource (HHEAR) is an environmental health sciences</title>
        <p>
          research program established by the National Institute for Environmental Health Science [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
HHEAR provides environmental epidemiology researchers with laboratory analysis of their
environmental and biological specimens and to incorporate the laboratory results and statistical
analyses with the original study data. The HHEAR Data Center makes the original study data,
the HHEAR laboratory results, and statistical analyses available to the public1 as a means to
improve our knowledge of the comprehensive efects of environmental exposures on human
health throughout the life course and “to catalyze new scientific insight from the colocation,
integration, and advanced statistical and data science analysis of multimodal data sets.”
        </p>
        <p>One challenge in HHEAR is integrating, normalizing, and harmonizing data from the several
studies accepted by the program. For example, pregnancy cohorts provided urine samples across
almost all studies, and several laboratory analyses were performed across many studies, such
as measurement of environmental exposures including phthalate metabolites. Because many
study variables are highly contextualized by both timing of measurement and study-specific
characteristics, a solution that leveraged a semantic infrastructure was the most amenable to
the program. The HHEAR Data Center uses the HADatAc infrastructure, Semantic Data
Dictionaries, and well-established biomedical domain ontologies to model the metadata, collected
data, and HHEAR laboratory analyses for the studies accepted by HHEAR and to build a single
harmonized knowledge graph from these components. Domain ontologies and SDDs are used
to normalize the semantics of each study variable, ensuring that variables across studies that
share common specifications are aligned using the same formal terminology. When involved
concepts are found not to be covered in existing domain ontologies, the HHEAR application
ontology fills gaps in coverage when there is no appropriate term in an established biomedical
ontology. We publish the HHEAR Ontology on BioPortal and release new versions whenever
we add completed studies to the knowledge graph.</p>
      </sec>
      <sec id="sec-4-2">
        <title>The HHEAR Harmonized Data Repository comprises a production instance of HADatAc,</title>
        <p>together with all of the loaded study metadata, measurement data, and ontology content. The</p>
      </sec>
      <sec id="sec-4-3">
        <title>HHEAR community can access the Harmonized Data Repository directly to search the data</title>
        <p>using HADatAc’s built-in data and study search capabilities and generate normalized datasets
from the search results. Custom facet search tools that leverage the SKGs built using Hadatac’s
infrastructure have also been integrated into the HHEAR Harmonized Data Repository and
allow users to generate multi-study normalized datasets via HADatAc’s APIs.</p>
        <p>The ability to create normalized datasets across multiple studies is a significant tool for the
research community because it enables data pooling across multiple HHEAR studies in which
all of the concepts across studies share the same vocabulary. This ensures that when variables
from diferent studies share the same context, the values from diferent studies appear in the
same column. It also ensures that when categorical variables from diferent studies refer to the
same entities, the values, and codes that appear in the dataset are globally unique and directly
tied to the ontology terms that define the category value.</p>
        <p>Table 1 provides some of the relevant overall statistics of the most recent release of the
Harmonized Data Repository. Access to the HHEAR Portal and Harmonized Data Repository
is available globally to any researcher who is afiliated with an academic or other institution
with an Institutional Review Board, or its equivalent. Prospective users must also agree to the
terms and conditions of the data use agreement.2 A walkthrough of the HHEAR Data Center is
available as an appendix (Section 7).</p>
      </sec>
      <sec id="sec-4-4">
        <title>1https://hheardatacenter.mssm.edu/. Due to policies beyond the control of the authors, the HHEAR Data Center</title>
        <p>website is only accessible within the United States.</p>
      </sec>
      <sec id="sec-4-5">
        <title>2https://hheardatacenter.mssm.edu/Register/Terms</title>
        <p>Studies
Subjects
Variables
Active users
Data sets
Measurements
31
16,518
1,259
219
142</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. FAIR Assessment</title>
      <sec id="sec-5-1">
        <title>The FAIR guideline has four categories of principles: Findable, Accessible, Interoperable, and</title>
      </sec>
      <sec id="sec-5-2">
        <title>Reusable [6]. We have evaluated the HADatAc infrastructure under the lens of the FAIR</title>
        <p>principles to highlight how HADatAc can help scientists publish high-quality data repositories.</p>
      </sec>
      <sec id="sec-5-3">
        <title>We summarize the evaluation in Figure 3, listing the FAIR guidelines and associated principles</title>
        <p>and whether HADatAc meets each principle or not.</p>
        <p>
          HADatAc meets the “Findable” guideline by employing the use of community-built RDF
ontologies that use unique Internationalized Resource Identifiers (IRIs) to identify classes,
relations, and entities within their domains, allowing the enrichment of the metadata representation
that support data publishers to map measurements to the instrument level. The “Accessible”
guideline is met by the use of SPARQL [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], allowing users to provide precise requests based on
any possible collection of variables, studies, data sources, and time restrictions so that derived
datasets can be generated from HADatAc. HADatAc meets the “Interoperable” guideline by
employing the SKG to represent all study metadata and annotating every data point with elements
in the SKG. The “Reusable” guideline is partially met as HADatAc delegates the metadata usage
license to the repository owner, not currently encoding license metadata in the graph.
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>HADatAc supports the HHEAR data FAIR adherence by providing the infrastructure to create</title>
        <p>custom faceted searchers, which improves findability and accessibility. Importantly, because
it leverages standardized terminology in ontologies, rather than original study variables, it
promotes reusability as it clarifies the meaning of measurements and assessments reported in
the datasets.</p>
        <p>FAIR Principle
HADatAc Meets</p>
        <p>Principle
Score</p>
        <p>F1
✓</p>
        <p>Findable
F2 F3
✓</p>
        <p>✓
4/4</p>
        <p>F4
✓</p>
        <p>A1
✓</p>
        <p>Accessible
A1.1 A1.2
✓</p>
        <p>✓
4/4</p>
        <p>A2
✓</p>
        <p>I1
✓</p>
        <p>Interoperable</p>
        <p>I2
✓
3/3</p>
        <p>I3
✓</p>
        <p>R1
✓</p>
        <p>Reusable
R1.1 R1.2 R1.3
✗
✓</p>
        <p>✗
2/4</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Community</title>
      <p>
        HADatAc is an open-source project3 that has been under development since 2014 [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Its
infrastructure’s source code was moved into a public GitHub Repository 4 on July 5th, 2015.
      </p>
      <sec id="sec-6-1">
        <title>3https://hadatac.org</title>
      </sec>
      <sec id="sec-6-2">
        <title>4https://github.com/hadatac/hadatac</title>
        <p>HADatAc.org comprises fiteen organizations from four countries: from the United States (5
universities and 3 research groups); Portugal (2 universities and 1 private company); Brazil (2
universities and 1 national research organization); and Canada (1 university). The infrastructure
includes comprehensive online documentation5 with instructions on how to install and use the
infrastructure. The HAScO, VSTO, and SIO ontologies that build HADatAc’s foundation are all
available at BioPortal.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Related Work &amp; Conclusion</title>
      <p>
        According to [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], scientific research data should ideally be shared through domain-specific
repositories that use data types widely employed in a field. These repositories are like data
warehouses, providing long-term access to data by assigning persistent IDs such as digital object
identifiers (DOI). ImmPort is a platform that collects and curates immunological data, which is
then shared through a public component [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. ImmPort’s domain is strictly for immunology [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ],
and the data model is not schema-free. The platform has some level of data lifting, although its
significance is unclear. The NIMH National Data Archive (NDA) uses Global Unique Identifiers
(GUIDs) to identify data from unique individuals [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Tools allow data download and search.
      </p>
      <sec id="sec-7-1">
        <title>Users must pass validation against the dictionary when uploading any data. Domain-agnostic</title>
        <p>
          repositories are generally chosen by investigators to deposit scientific data, such as Figshare [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]
or Zenodo [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], along with metadata that accurately describes the included files and their format.
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>However, these approaches do not provide any further data integration or harmonization, storing</title>
        <p>
          the data “as-is.” European Data Spaces [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] is a policy that proposes a foundation for the “data
economy” in Europe, which has fostered the development of several data frameworks [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
        </p>
        <p>The availability of entity characterization, along with logical linkage between data and
scientific study knowledge, is one of HADatAc’s benefits that scientists may immediately
observe and recognize when processing scientific data. HADatAc has proven to be a useful tool
for integrating data from multiple domains. Through the use of ontologies as shared metadata
standards, data are annotated, integrated, and stored into a knowledge base. The metadata can
then be used to query the knowledge base to retrieve relevant datasets without the domain
expert having detailed knowledge of the original structures of these datasets.</p>
      </sec>
      <sec id="sec-7-3">
        <title>Beyond the HHEAR program, HADatAc has become an important tool for the SKG community</title>
        <p>across multiple projects and scientific domains, supporting, for example, research eforts within</p>
      </sec>
      <sec id="sec-7-4">
        <title>Rensselaer Polytechnic Institute that use the National Health and Nutrition Examination Surveys</title>
        <p>
          (NHANES)6 [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], promoting semantically-enabled data analysis [
          <xref ref-type="bibr" rid="ref28 ref29">28, 29</xref>
          ]. HADatAc is also being
used to support projects involving the Internet of Things (IoT) in Europe, where we observe
that usage scenarios can be far more complex since objects like cars and buildings can have
thousands of sensors organized in many subsystems.
        </p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <sec id="sec-8-1">
        <title>5https://github.com/paulopinheiro1234/hadatac/wiki</title>
      </sec>
      <sec id="sec-8-2">
        <title>6https://www.cdc.gov/nchs/nhanes/index.htm</title>
      </sec>
      <sec id="sec-8-3">
        <title>The HHEAR Data Center is funded by the National Institute of Environmental Health Studies</title>
        <p>grant U2CES026555. Publicly available data used in this study was generated through grants
supported by the National Institutes of Health as part of the Human Health Exposure
Analysis Resource (HHEAR). The content is solely the responsibility of the authors and does not
necessarily represent the oficial views of the National Institutes of Health.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>7. Appendices</title>
      <sec id="sec-9-1">
        <title>HHEAR Data Center walkthrough</title>
        <p>HHEAR Data portal home page, powered by HADatAc</p>
        <p>Variable search and
dataset retrieval</p>
        <p>functions
Counts related to
currently published</p>
        <p>HHEAR studies
allow search by specific study variables. All contents on this page are dynamic and retrieved from a
HADatAc-built SKG using SPARQL.</p>
        <p>Studies containing
the selected
variables.</p>
        <p>Involved cohorts.</p>
        <p>Selected variables.</p>
        <p>Variable data is
collected during 2
times.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abbasian Dehkordi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Farajzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rezazadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Farahbakhsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sandrasegaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Abbasian</given-names>
            <surname>Dehkordi</surname>
          </string-name>
          ,
          <article-title>A survey on data aggregation techniques in IoT sensor networks</article-title>
          ,
          <source>Wireless Networks</source>
          <volume>26</volume>
          (
          <year>2020</year>
          )
          <fpage>1243</fpage>
          -
          <lpage>1263</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ramalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pernici</surname>
          </string-name>
          ,
          <article-title>Challenges of a Data Ecosystem for scientific data</article-title>
          ,
          <source>Data &amp; Knowledge Engineering</source>
          <volume>148</volume>
          (
          <year>2023</year>
          )
          <fpage>102236</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaudhri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Baru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Chittar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Genesereth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalyanpur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lenat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          , et al.,
          <article-title>Knowledge graphs: introduction, history and</article-title>
          , perspectives,
          <source>AI</source>
          Magazine
          <volume>43</volume>
          (
          <year>2022</year>
          )
          <fpage>17</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Manghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mannocci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sacharidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salatino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vergoulis</surname>
          </string-name>
          ,
          <article-title>New trends in scientific knowledge graphs and research impact assessment</article-title>
          ,
          <source>Quantitative Science Studies</source>
          <volume>2</volume>
          (
          <year>2021</year>
          )
          <fpage>1296</fpage>
          -
          <lpage>1300</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Chalk</surname>
          </string-name>
          ,
          <article-title>SciData: a data model and ontology for semantic representation of scientific data</article-title>
          ,
          <source>Journal of Cheminformatics</source>
          <volume>8</volume>
          (
          <year>2016</year>
          )
          <fpage>54</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The fair guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific data 3</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , M. Ravi,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pinheiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stingone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <article-title>A semi-automated approach to data harmonization across environmental health studies</article-title>
          ,
          <source>in: ISEE Conference Abstracts</source>
          , volume
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCusker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pinheiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Bax</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Stingone</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. K. Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. L. McGuinness</surname>
          </string-name>
          ,
          <article-title>The semantic data dictionary-an approach for describing and annotating data</article-title>
          ,
          <source>Data Intelligence</source>
          (
          <year>2020</year>
          )
          <fpage>443</fpage>
          -
          <lpage>486</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Stingone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Masters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <article-title>Complex semantic tabular interpretation using sdd-gen</article-title>
          ,
          <source>in: 2024 IEEE 18th International Conference on Semantic Computing (ICSC)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>317</fpage>
          -
          <lpage>322</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pinheiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bax</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>McCusker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <article-title>Annotating Diverse Scientific Data with HAScO</article-title>
          ,
          <source>in: Proceedings of the Seminar on Ontology Research in Brazil 2018 (ONTOBRAS</source>
          <year>2018</year>
          ). São Paulo,
          <string-name>
            <surname>SP</surname>
          </string-name>
          , Brazil,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Apache</given-names>
            <surname>Software</surname>
          </string-name>
          <string-name>
            <given-names>Foundation</given-names>
            ,
            <surname>Apache</surname>
          </string-name>
          <string-name>
            <surname>SOLR</surname>
          </string-name>
          , http://solr.apache.org,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Apache</given-names>
            <surname>Software</surname>
          </string-name>
          <string-name>
            <surname>Foundation</surname>
          </string-name>
          , Apache Jena, https://jena.apache.org,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          , et al.,
          <article-title>The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery</article-title>
          ,
          <source>Journal of Biomedical Semantics</source>
          <volume>5</volume>
          (
          <year>2014</year>
          )
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Viet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Falman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. S.</given-names>
            <surname>Merrill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Faustman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Savitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mervish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Barr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Peterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Balshaw</surname>
          </string-name>
          , et al.,
          <article-title>Human health exposure analysis resource (hhear): A model for incorporating the exposome into health studies</article-title>
          ,
          <source>International journal of hygiene and environmental health 235</source>
          (
          <year>2021</year>
          )
          <fpage>113768</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lebo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <surname>PROV-O: The PROV Ontology</surname>
          </string-name>
          ,
          <source>W3C Recommendation, W3C</source>
          ,
          <year>2013</year>
          . URL: https://www.w3.org/TR/2013/REC-prov-o-
          <volume>20130430</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Fox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cinquini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>West</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Benedict</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Middleton</surname>
          </string-name>
          ,
          <article-title>Ontology-supported scientific data frameworks: The virtual solar-terrestrial observatory experience</article-title>
          ,
          <source>Computers &amp; Geosciences</source>
          <volume>35</volume>
          (
          <year>2009</year>
          )
          <fpage>724</fpage>
          -
          <lpage>738</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          , Sparql
          <volume>1</volume>
          .1 query language,
          <year>2013</year>
          . URL: https://www.w3.org/TR/ sparql11-query/.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>D. L. McGuinness</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Pinheiro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Klawonn</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Chastain</surname>
          </string-name>
          ,
          <article-title>Semantic Support for Complex Ecosystem Research Environments</article-title>
          ,
          <source>AGU Fall Meeting Abstracts</source>
          <volume>33</volume>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>J. B. Byrd</surname>
            ,
            <given-names>A. C.</given-names>
          </string-name>
          <string-name>
            <surname>Greene</surname>
            ,
            <given-names>D. V.</given-names>
          </string-name>
          <string-name>
            <surname>Prasad</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          <string-name>
            <surname>Greene</surname>
          </string-name>
          ,
          <article-title>Responsible, practical genomic data sharing that accelerates research</article-title>
          ,
          <source>Nature Reviews Genetics</source>
          <volume>21</volume>
          (
          <year>2020</year>
          )
          <fpage>615</fpage>
          -
          <lpage>629</lpage>
          . Publisher: Nature Publishing Group.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dunn</surname>
          </string-name>
          , C. G. Thomas,
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schaefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Zalocusky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Shankar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Shen-Orr</surname>
          </string-name>
          , et al.,
          <article-title>Immport, toward repurposing of open access immunological assay data for translational and clinical research</article-title>
          ,
          <source>Scientific data 5</source>
          (
          <year>2018</year>
          )
          <fpage>180015</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.-A.</given-names>
            <surname>Sansone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cruse</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Thorley, High-quality science requires high-quality open data infrastructure</article-title>
          ,
          <source>Scientific data 5</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Huerta</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. J. McAulife</surname>
            ,
            <given-names>G. K.</given-names>
          </string-name>
          <string-name>
            <surname>Farber</surname>
          </string-name>
          ,
          <article-title>Sharing heterogeneous data: the national database for autism research</article-title>
          ,
          <source>Neuroinformatics</source>
          <volume>10</volume>
          (
          <year>2012</year>
          )
          <fpage>331</fpage>
          -
          <lpage>339</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Thelwall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kousha</surname>
          </string-name>
          ,
          <article-title>Figshare: a universal repository for academic resource sharing?</article-title>
          ,
          <source>Online Information Review</source>
          <volume>40</volume>
          (
          <year>2016</year>
          )
          <fpage>333</fpage>
          -
          <lpage>346</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>M.-A. Sicilia</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>García-Barriocanal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sánchez-Alonso</surname>
          </string-name>
          ,
          <article-title>Community curation in open dataset repositories: insights from zenodo</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>106</volume>
          (
          <year>2017</year>
          )
          <fpage>54</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <article-title>A European strategy for data, 2020</article-title>
          . URL: https://eur-lex.europa.eu/legal-content/EN/TXT/ ?uri=CELEX%
          <fpage>3A52020DC0066</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>B.</given-names>
            <surname>Otto</surname>
          </string-name>
          ,
          <article-title>A federated infrastructure for European data spaces</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>65</volume>
          (
          <year>2022</year>
          )
          <fpage>44</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>H.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pinheiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <article-title>Knowledge Graph Construction from Data, Data Dictionaries, and</article-title>
          <source>Codebooks: the National Health and Nutrition Examination Surveys Use Case</source>
          ,
          <year>2022</year>
          . URL: https://us2ts.org.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pinheiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Bennett</surname>
          </string-name>
          ,
          <article-title>Demographic and socioeconomic determinants of access to care: A subgroup disparity analysis using new equity-focused measurements</article-title>
          ,
          <source>PLOS ONE 18</source>
          (
          <year>2023</year>
          )
          <article-title>e0290692</article-title>
          .
          <source>Publisher: Public Library of Science.</source>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pinheiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Qi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Bennett</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. L. McGuinness,</surname>
          </string-name>
          <article-title>Towards Machine-Assisted Biomedical Data Preparation: A Use Case on Disparity in Access to Health Care</article-title>
          ,
          <source>in: Proceedings of the 6th Workshop on Semantic Web Solutions for Large-Scale Biomedical Data Analytics</source>
          , volume
          <volume>3466</volume>
          <source>of CEUR Workshop Proceedings</source>
          , CEUR, Hersonissos, Greece,
          <year>2023</year>
          . ISSN:
          <fpage>1613</fpage>
          -
          <lpage>0073</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>