<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Conceptual Architecture for AI-based Big Data Analysis and Visualization Supporting Metagenomics Research</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thoralf Reis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Krause</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco X. Bornschlegl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias L. Hemmje</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Hagen, Faculty of Mathematics and Computer Science</institution>
          ,
          <addr-line>58097 Hagen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>264</fpage>
      <lpage>272</lpage>
      <abstract>
        <p>This paper targets to introduce an architecture for Artificial Intelligence (AI) based Big Data Analysis and Visualization supported metagenomics research based on the AI2VIS4BigData Reference Model. Metagenomics research covers the examination of huge amounts of data to improve the understanding of microbial communities. Technological and methodical improvements in Big Data Analysis drive progress in metagenomics research and thereby support practical applications like, e.g., the analysis of cattle rumen with the research goal of reducing the negative impact of cattle breeding on global warming. AI2VIS4BigData is a reference model for the combined application areas of Big Data Analysis, AI, and Visualization. Its purpose is to support scientific and industrial activities with guidelines and a common terminology to enable efficient exchange of knowledge and information and thereby prevent ”reinventing the wheel”. The general applicability of the AI2VIS4BigData model for metagenomics has been validated in a previous publication. As a next step, this paper derives a conceptual architecture that specifies a possible adaption of AI2VIS4BigData for metagenomics. For this, three new metagenomic publications utilizing AI and Visualizations are assessed.</p>
      </abstract>
      <kwd-group>
        <kwd>Metagenomics ➲ Big Data ➲ AI ➲ Visualization ➲ AI2VIS4BigData</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction and Motivation</title>
      <p>
        Metagenomics research analyzes relationships within whole microbial
communities while genomics research focuses on the analysis of genes or the genome of
a single organism [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. A practical example for metagenomics research is the
investigation of the rumen microbiota regarding its influence in cattle greenhouse
gas emissions and food conversion efficiency [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as cattle are a major contributor
to climate change and relevant for food security, two significant challenges
society is facing [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The demand for data in metagenomics research is significantly
bigger than for regular genomics research: the investigation of relationships and
coherence between organisms or genes in and between metagenomic samples
2
      </p>
      <sec id="sec-1-1">
        <title>T. Reis, T. Krause et al.</title>
        <p>
          [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] requires biological researchers to process, store, and exchange big amounts
of data e.g. via specialized bioinformatics databases [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Hence, metagenomics
research benefits on a large scale from progress and development in Big Data
Analysis such as decreasing costs for storage and processing of huge amounts of
data. With the EU-funded MetaPlat1 project, scientists from different research
institutions with either Big Data Analysis or bioinformatics background worked
together to develop the MetaPlat platform. This cloud based Big Data Analysis
platform is specialized to analyze metagenomics data like, e.g., rumen microbiota
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. For an effective analysis of Big Data, the platform empowers the researchers
to utilize cutting-edge technology such as Artificial Intelligence (AI) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and
various forms of Information Visualization (IVIS) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] to provide the researchers with
visual feedback of their activities and enable them to identify new insights.
        </p>
        <p>
          To define the vague term Big Data, a popular approach is to follow the data
management challenges outlined by Doug Laney [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. These challenges comprise
three dimensions (the three v’s): variety (ambiguous data manifestations
regarding e.g. data format, data structure or data semantics), volume (big amount of
data), and velocity (high frequent data inflow) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. By this definition, the sheer
volume of data in metagenomics research allows labeling it as Big Data. The
collective term AI summarizes techniques and methods such as symbolic AI or
Machine Learning (ML) to implement intelligence for machines (in contrary to
human or animal natural intelligence) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Example application scenarios of AI in
metagenomics research of rumen are the analysis of data through clustering [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
or the training of classifiers to categorize data samples [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Big Data and AI are
closely connected to each other [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]: Big Data is very useful to derive, validate,
apply, and enhance AI models while AI-driven algorithms enable the exploration
of Big Data and its potential. Visualization of data, processing steps as well as
AI model development is an important link between both application areas. It
enhances comprehension and decreases entry barriers for new users. In addition,
visualization offers the chance to meet the growing demand for explainability
and transparency of AI
        </p>
        <p>
          With [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], the AI2VIS4BigData Reference Model for research and practical
applications in the application areas Big Data Analysis, AI, and Visualization
was introduced. Its objective is to provide a common specification as well as a
common basis for discussion and thereby reduce the risk of inefficiency through
reinventing the wheel and solving problems that have already been solved
elsewhere. The reference model’s theoretical applicability was evaluated in an expert
round table workshop featuring presentations from three practical application
domains: health care, economics, and metagenomics [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Until now, the reference
model was validated only for one metagenomics research application [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. This
paper targets to validate three further metagenomics research applications from
the MetaPlat project to assess if an architecture can be derived.
        </p>
        <p>Within the remainder of this paper, the AI2VIS4BigData Reference Model
and the three assessed metagenomics research publications from the MetaPlat
project are introduced, the pursued architecture modeling approach is presented</p>
        <sec id="sec-1-1-1">
          <title>1 https://metaplat.eu</title>
          <p>Interaction &amp; Perception
E</p>
          <p>AI Transparency, Explanation
&amp; Data Privacy</p>
          <p>AI Model Meta Information</p>
          <p>
            C
(Section 1.3), a multi-layered conceptual architecture is introduced in Section 2
together with an initial validation in Section 3 before this paper concludes with
outlining its contributions and providing an outlook (Section 4).
1.1 AI2VIS4BigData Reference Model
The AI2VIS4BigData Reference Model (Figure 1) was derived through
projecting the AI lifecycle phases of AIGO’s AI System Lifecycle [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] onto the
IVIS4BigData Reference Model [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] considering the different AI models (ML or
statistical AI models as well as symbolic AI models) for supporting Big Data
Analysis, AI data, and AI user stereotypes [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. It contains the three processing
steps Data Management &amp; Curation, Analytics, Interaction and Perception
accompanied with a data intelligence layer for user interaction and User Interfaces
(UI) of IVIS4BigData [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ], a reference model that target to ”close the gap in
research with regard to information visualization challenges of Big Data
Analysis as well as context awareness” [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. AI2VIS4BigData introduces a model
deployment layer that spreads over the three processing steps [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. AI models
are executed directly within the data and information loop which links the
deployed models to the input data they need for execution and compute output
data that is fed back into the Big Data Analysis system [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. The remaining
activities of AI system life cycle phases are displayed within the analytics layer as
Design, Implementation &amp; Training, Data Selection, Verification &amp; Validation
as well as Operation &amp; Monitoring and interconnected through bidirectional
arrows emphasizing the iterative nature of AI model design [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. The different
reference model elements are linked to five clearly distinguished AI user stereotypes
(model designer, domain expert, model deployment engineer, model operator,
model end user, and model governance officer) and four clearly distinguished
Big Data Analysis user stereotypes (system owner, data scientists, management
consultants as well as directors including C-levels) [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ].
1.2
          </p>
          <p>
            Assessed Metagenomic Use Cases
In [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] a conceptual workflow for metagenomic studies was presented and
demonstrated using two previously published metagenomic use cases. The first of these
4
          </p>
          <p>
            T. Reis, T. Krause et al.
use cases [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] was the visualization of gene dependencies using a whole-genome
approach and a new framework for improved correlation measurement between
genes. The second publication [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] analyzed the relationship between
microbiomes in feces and rumen using a taxonomic analysis of partial genome
sequences (barcode sequences). Together they cover the two main branches of
metagenomic analyses (taxonomic and functional). It was shown in [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] that
the metagenomics analysis workflow extracted from these publications can be
mapped directly onto the AI2VIS4BigData Reference Model therefore validated
its relevance for the field of metagenomics.
          </p>
          <p>This section will introduce three additional publications in metagenomics
research that serve as a base to further validate and transform this conceptual
workflow into a generic architecture. The publications were selected as they
are practical examples for metagenomic analysis (which can represent Big Data
Analysis applications depending on the sample size), carried out by different
researchers and most importantly, they describe the usage of statistical methods
or ML as well as Visualization. Although all selected publications originate from
the MetaPlat project, they represent different research approaches like, e.g.,
the analysis of genes or the analysis of OTUs. In addition, the homogeneous
MetaPlat terminology eases the architecture derivation.</p>
          <p>
            The publication A Metagenomics Analysis of Rumen Microbiome [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] by
P. Walsh et al. demonstrates a metagenomic analysis of the ”Bos taurus” rumen
microbiome using ML models in a cloud based environment. For optimal
performance and scalability, a queueing system is used between individual components,
thus enabling asynchronous and parallel execution. After importing raw sequence
data into the system, it is written to one of these processing queues which feed
into a similar metagenomic analysis workflow that uses the QIIME toolset to
perform data cleanup and clustering of sequences into Operational Taxonomic
Units (OTUs). The workflow assigns taxonomic labels to these OTUs. In an
analytics step, various ML models are used to classify the samples into phenotypes
using the taxonomic data of the previous steps as an input. Finally, the
publication showcases various visualizations ranging from a taxonomic composition
chart to plots of algorithmic accuracy and other AI metrics.
          </p>
          <p>
            In Analysis of Rumen Microbial Community in Cattle through the
Integration of Metagenomic and Network-based Approaches [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ], H. Wang et al.
functionally analyze the rumen microbial community in cattle through application of
a network-based approach: the authors construct a co-abundance network
utilizing the ”relative abundance of 1570 microbial genes” [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] that enables them to
identify functional modules. In doing so, they present a method to automatically
determine a cutoff threshold value to generate the co-abundance network in the
first place [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. While the first publication [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] uses partial sequences sufficient to
identify and analyse the taxonomic composition, this publication is based on
whole genome data which enables the analysis of genes. Together they cover
the two main branches of metagenomic studies. To construct the co-abundance
network used in the publication, the short reads generated by next-generation
sequencing platforms are assembled into longer sequences. These sequences are
then matched to the KEGG2 database to identify genes (and associated
metadata) present in the samples. Using the relative abundances of these genes,
correlations can then be computed by analyzing how the abundance of one gene
affects the abundance of other genes across the various samples. Since the
presence or absence of a correlation is not always distinguishable from statistical
noise, a suitable cutoff value is then determined using an automated
computational method. Using the cutoff values, a network graph can be constructed that
represents genes as nodes and the correlation strength as the length of edges
connecting these nodes.
          </p>
          <p>
            As third and last assessed publication, M. Wang et al., the authors of
Understanding the relationships between rumen microbiome genes and metabolites
to be used for prediction of cattle phenotypes [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] combined metabolomics with
metagenomics in order to identify differences in diets and methane emissions
from rumen metabolites and microbial genes. They analyzed 36 rumen samples
and identified the difference in the response of rumen microbes to different basal
diets which down the road affect cattle methane emissions [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. The study starts
from gene abundance data of cattle rumen obtained from previous studies on
the experiment designed by Roehe et al. [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. The abundance data was cleansed
and transformed before conducting multiple activities to determine correlations
between genes and metabolites related to the differences in diets in the
experiment design. The correlation data was then used to build correlation networks
as well as various other plots and result tables.
1.3
          </p>
          <p>
            Dicsussion, Conclusion, and Identification of remaining
Architectural Challenges
In order to arrive at a generic architecture that enables the management,
analysis, and visualization of metagenomic data as well as the fusion with other
health related data and knowledge, the first step is mapping the introduced
metagenomics publications to the generic stages of the AI2VIS4BigData
Reference Model (”Data Management &amp; Curation”, ”Analytics” and ”Interaction
&amp; Curation”). This is easy to validate as all three publications include steps to
ingest, manage or cleanup metagenomic sequences, all of them include statistical
or ML methods for analytics and also all of them produce one or more
visualizations. The same was previously already demonstrated in [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. Therefore, it is
proposed that an architectural model should explicitly model these stages.
          </p>
          <p>
            Looking at the papers in detail, further requirements for a comprehensive
architectural model can be derived: The first publication describes the importance
of using individual components that communicate through asynchronous
mechanisms like, e.g., queuing systems to achieve high performance and scalability.
The impact of Big Data and ML in Metagenomics is also mentioned as a
challenge in [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. A suitable architecture should therefore aim to separate individual
parts and components of the system where possible so that they can operate
and scale individually. The second publication [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] shows the need of additional
          </p>
        </sec>
        <sec id="sec-1-1-2">
          <title>2 Kyoto Encyclopedia of Genes and Genomes, https://kegg.jp</title>
          <p>6</p>
          <p>
            T. Reis, T. Krause et al.
knowledge sources like, e.g., gene databases for the analysis of metagenomic
sequences. Our proposed architecture should therefore support the ingestion and
persistence of these additional data sources into a knowledge network that can
be used by metagenomic workflows. The third publication [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] is important as it
does not start from raw sequence data but from intermediate results obtained
from other studies. Our architecture should be able to reuse the same
intermediate results for several distinct analyses thus requiring the persistence of these
intermediate results. This requirement also partially addresses the challenge of
”Reproducibility” mentioned in [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] and the area of ”AI Transparency,
Explanation &amp; Data Privacy” of the AI2VIS4BigData model. All three publications
differ significantly in the exact steps executed in the analysis phase and the
visualizations produced. It is therefore important that the analysis is done in a
modular fashion where the order and type of steps is dynamic and that a wide
range of visualizations is supported.
2
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>AI2VIS4BigData Conceptual Architecture supporting</title>
    </sec>
    <sec id="sec-3">
      <title>Metagenomics Research</title>
      <p>
        This paper introduces the AI2VIS4BigData architecture (Figure 2) for
processing and analysis of metagenomic data in an AI and Big Data environment. It
was designed by extending the Big Data Analysis and Visualization architecture
of IVIS4BigData [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] with AI and metagenomic aspects in order to fulfill the
metagenomics requirements outlined in Section 1.3. The architecture is
vertically split into three pillars separating the components for metagenomics data
integration and processing (domain-specific input), AI and data science
modeling and configuration (AI analysis input) from the components responsible for
result visualization and data generation (output). This is based on the design
principle of Separation of Concerns (SoC) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and makes it easier to develop,
scale or exchange the components separately. Each of these three pillars is
structured into three layers following the Model View Controller (MVC) pattern [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
with a shared persistence layer interconnecting all three pillars. The bottom
layer represents the model, the top layers represent the view while the middle
layers contain the controllers. Metagenomics-specific architecture elements are a
dedicated user, knowledge and data artifacts within the input layer, assets and
knowledge networks in persistence layer as well as domain-specific end user
interfaces. The following rough description of the individual layers and components
follows the flow of data, starting from the top left at data input and ending with
result visualization at the top right corner:
      </p>
      <p>Knowledge &amp; Data Input. Within this layer, expert users or systems
ingest metagenomics-related knowledge and data into the system. These
information comprise biological and genetical knowledge (e.g. protein metadata or
knowledge automatically extracted from scientific publications) as well as
diagnostic and subject data (e.g. metagenomic sequences).</p>
      <p>AI Integration &amp; Fusion. This layer contains all services and methods
to integrate the various domain-specific inputs into the system, to perform a
data fusion and persist it as structured content or knowledge network. The
se</p>
      <sec id="sec-3-1">
        <title>AI2VIS4BigData Architecture supporting Metagenomics Research 7</title>
        <p>Knowledge &amp; Data Input</p>
        <p>Model &amp; Configuration Input</p>
        <p>End User Interface
mantic integration is realized through implementation of the mediator wrapper
approach.</p>
        <p>Model &amp; Configuration Input. The necessary knowledge and information
for configuring the AI applications within the system is provided by AI and data
science expert users within this layer. The input contains the required knowledge
to register and schedule all AI services and to select appropriate analysis methods
and algorithms. The additional AI2VIS4BigData role of the Governance Officer
ensures legal compliance and maintaining ethical standards through providing
relevant constraints.</p>
        <p>AI Analysis. The middle layer is responsible for performing analysis tasks
on behalf of the user. A workflow system together with a service registry
allow for flexible configuration of the required analysis steps while the scheduler
manages the execution of these steps on distributed or local computing nodes.
Intermediate and final results are stored persistently.</p>
        <p>Persistence. The persistence layer targets to store various types of data and
enable data exchange between overlying layers. Raw data is stored in a data lake
with little to none processing performed to improve reproducibility and
trans8</p>
      </sec>
      <sec id="sec-3-2">
        <title>T. Reis, T. Krause et al.</title>
        <p>parency of the system. Structured data includes parsed genetic sequences,
intermediate results from analysis processes and other kind of schema-bound data.
Lastly a knowledge network tries to represent biological and medical knowledge
as well as semantic rules required for Symbolic AI in a machine readable way.</p>
        <p>AI Input/Output. The purpose of this layer is to intelligently interpret
the intentions of the system’s end user (e.g. through applying natural language
processing) and present the information that is relevant for them in a suiting
form (e.g. after performing a dimensionality reduction or selecting appropriate
visualization techniques).</p>
        <p>End User Interface. The end user interface layer contains the multimodal
interfaces through which the system’s end users access its data and
information. These interfaces comprise visualizations, reports and dialogue systems that
present the domain-specific artifacts (e.g. taxonomic compositions).
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Initial Validation and Remaining Challenges</title>
      <p>The proposed architecture specifies all areas of the AI2VIS4BigData Reference
Model. The area ”Data Management &amp; Curation” of the reference model is
addressed by the left pillar. The ”Analysis” area is covered by the second pillar and
especially the ”AI Analysis” layer. Finally, the ”Interaction &amp; Perception” area
is implemented through the right pillar. The architecture also implements all
requirements that were outlined previously. A detailed mapping of the
requirements to the architecture elements would be beyond the scope of this paper, yet
is planned for future work. Individual components for input, analysis and
visualization are strictly split by the three pillars and communicate asynchronously
through the persistence layer allowing for flexible scaling and Big Data
processing. Additional knowledge sources are supported by providing a data agnostic
input layer together with a mediator wrapper architecture for data integration.
The persistence layer ensures that reproducibility and transparency is possible
by storing intermediate and final results. Finally, a flexible workflow system
and a service registry support the heterogeneity of metagenomic studies and
allow easy integration of new analysis methods. The remaining challenges for the
architecture comprise a harmonization with the IVIS4BigData architecture, a
generalization for application domains beyond metagenomics research, a
technical specification as well as a proof of concept technical implementation. Since
the selected publications were limited to the MetaPlat project, the assessment of
practical applicability for the introduced architecture in metagenomics research
beyond MetaPlat is a further remaining challenge.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Outlook</title>
      <p>In the course of this paper, three MetaPlat publications were assessed that
analyze rumen microbiota through metagenomics research utilizing Big Data
Analysis, AI as well as visualization. Objective of this assessment was the derivation of
a AI2VIS4BigData-based conceptual architecture for real-life application in this
three-fold research area. The resulting AI2VIS4BigData conceptual architecture
supporting metagenomics research was introduced in Section 2. It consists of
seven layers arranged alongside the three levels of the MVC pattern. As outlook,
future work is planned to overcome the challenges introduced in Section 3.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Engel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fuchs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Kevitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hemmje</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Walsh</surname>
          </string-name>
          , “
          <article-title>A Metagenomic Content and Knowledge Management Ecosystem Platform</article-title>
          ,”
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Walsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Palu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lawor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Wassan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , “
          <article-title>A Metagenomics Analysis of Rumen Microbiome,”</article-title>
          <source>Proceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine</source>
          , pp.
          <fpage>2077</fpage>
          -
          <lpage>2082</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Browne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Roehe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Dewhurst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Engel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hemmje</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Walsh</surname>
          </string-name>
          , “
          <article-title>Analysis of Rumen Microbial Community in Cattle through the Integration of Metagenomic and Network-based Approaches</article-title>
          ,”
          <source>2016 IEEE International Conference on Bioinformatics and Biomedicine</source>
          , pp.
          <fpage>198</fpage>
          -
          <lpage>203</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Dewhurst</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Roehe</surname>
          </string-name>
          , “
          <article-title>Understanding the relationships between rumen microbiome genes and metabolites to be used for prediction of cattle phenotypes,”</article-title>
          <source>in BIBE 2019; The Third International Conference on Biological Information and Biomedical Engineering. VDE</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Laney</surname>
          </string-name>
          , “
          <article-title>3D Data Management: Controlling Data Volume, Velocity,</article-title>
          and Variety,” META Group,
          <source>Tech. Rep.</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. ISO, “
          <source>ISO/IEC JTC 1/SC 42 Artificial Intelligence</source>
          ,”
          <year>2018</year>
          . [Online]. Available: https://isotc.iso.org/livelink/livelink/open/jtc1sc42
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>T.</given-names>
            <surname>Reis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. X.</given-names>
            <surname>Bornschlegl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Hemmje</surname>
          </string-name>
          , “
          <article-title>Towards a Reference Model for Artificial Intelligence Supporting Big Data Analysis</article-title>
          ,” To appear
          <source>in: Proceedings of the 2020 International Conference on Data Science (ICDATA'20)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. --, “AI2VIS4BigData:
          <article-title>Qualitative Evaluation of a Big Data Analysis</article-title>
          ,
          <string-name>
            <surname>AI</surname>
          </string-name>
          , and Visualization Reference Model,” To appear
          <source>in: Lecture Notes in Computer Science</source>
          , vol.
          <source>LNCS 10084</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>T.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Andrade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Afli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zheng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hemmje</surname>
          </string-name>
          , “
          <article-title>Understanding the Role of (Advanced) MachineLearning in Metagenomic Workflows</article-title>
          ,” To appear
          <source>in: Lecture Notes in Computer Science</source>
          , vol.
          <source>LNCS 10084</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>OECD</surname>
          </string-name>
          ,
          <source>Artificial Intelligence in Society</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>M. X. Bornschlegl</surname>
          </string-name>
          , “
          <string-name>
            <surname>Advanced Visual Interfaces Supporting Distributed CloudBased Big Data Analysis</surname>
          </string-name>
          ,
          <source>” Dissertation</source>
          , University of Hagen,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. H.
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Dewhurst</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Roehe</surname>
          </string-name>
          , “
          <article-title>Improving the Inference of Cooccurrence Networks in the Bovine Rumen Microbiome,”</article-title>
          <source>IEEE/ACM Transactions on Computational Biology and Bioinformatics</source>
          , p.
          <fpage>1</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>B. G. N.</given-names>
            <surname>Andrade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Bressani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R. C.</given-names>
            <surname>Cuadrat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Tizioto</surname>
          </string-name>
          , P. S. N. de Oliveira, G. B. Moura˜o,
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Coutinho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Reecy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Koltes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Walsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Berndt</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. C. A.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C. P.</given-names>
            <surname>Palhares</surname>
          </string-name>
          , and L.
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>A</article-title>
          . Regitano, “
          <article-title>The structure of microbial populations in Nelore GIT reveals inter-dependency of methanogens in feces and rumen</article-title>
          ,
          <source>” Journal of animal science and biotechnology</source>
          , vol.
          <volume>11</volume>
          , p.
          <fpage>6</fpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>R.</given-names>
            <surname>Roehe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Dewhurst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-A.</given-names>
            <surname>Duthie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Rooke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>McKain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Hyslop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Waterhouse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Freeman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Watson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Wallace</surname>
          </string-name>
          , “
          <article-title>Bovine Host Genetic Variation Influences Rumen Microbial Methane Production with Best Selection Criterion for Low Methane Emitting and Efficiently Feed Converting Hosts Based on Metagenomic Gene Abundance,”</article-title>
          <source>PLOS Genetics</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. E. W. Dijkstra, “
          <article-title>On the role of scientific thought,” in Selected writings on computing: a personal perspective</article-title>
          . Springer,
          <year>1982</year>
          , pp.
          <fpage>60</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>M. Fowler</surname>
          </string-name>
          ,
          <article-title>Patterns of enterprise application architecture</article-title>
          .
          <source>Addison-Wesley Longman Publishing Co., Inc</source>
          .,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>