<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A pipeline for functional and visual analytics of microbial genetic networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Leandro Corrêa</string-name>
          <email>hscleandro@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ronnie Alves</string-name>
          <email>ronnie.alves@itv.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabiana Goés</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristian Chaparro</string-name>
          <email>cristian.chaparro@itv.org</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucinéia Thom</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>PPGC - Federal University of Rio Grande do Sul</institution>
          ,
          <addr-line>Porto Alegre</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>PPGCC - Federal University of Pará</institution>
          ,
          <addr-line>Belém</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Vale Institute of Technology</institution>
          ,
          <addr-line>Belém</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Microorganisms abound everywhere. Though we know they play key roles in several ecosystems, too little is known about how these complex communities work. To act as a community they must interact with each other in order to achieve such community stability in which proper functions could help to adapt and survive to unbearable conditions. Thus, to e ectively understand microbial genetic networks it is necessary to explore them by means of systems biology. An important challenge in systems biology is to determine the structures and mechanisms by which these complex networks control cell processes. In this paper, we present the FUNN-MG pipeline for functional and visual analytics of microbial genetic networks allowing to uncover strong interactions inside microbial communities.</p>
      </abstract>
      <kwd-group>
        <kwd>systems biology</kwd>
        <kwd>gene and pathway enrichment analysis</kwd>
        <kwd>graph representation</kwd>
        <kwd>graph visualization</kwd>
        <kwd>metagenomics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Microorganisms abound in every part of the biosphere including soil, hot springs,
on the ocean floor, high in the atmosphere, deep inside rocks within the Earth’s
crust and in human tissues. They are extremely adaptable to conditions where
no one else could be able to survive.</p>
      <p>Their adaptability is mainly due to the fact that they live in complex
communities. Interactions inside the microbial networks plays essential functions for the
maintenance and survival of the community. Unfortunately, too little is known
about microbial interactions.</p>
      <p>
        With the recent advent of High-Throughput Sequencing (HTS) technologies,
metagenomic 1 sequencing approaches have been applied to investigate
characterizations of diverse microbial communities, including target sequencing of the
phylogenetic marker gene encoding 16S rRNA and whole-metagenome shotgun
1 Metagenomics is a discipline that enables the study of the (meta)genomes of
uncultured microorganisms [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
sequencing [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Additionally, the rapid development of numerous computational
tools and methodologies have been explored for e ective interpretation and
visualization of taxonomic and metabolic profiling of complex microbial
communities. Putting into perspective applications in several domains such as agriculture
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], medicine [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and biomineralization [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Despite the large advance in computational technologies for metagenomics
analysis there is still a lack of proper tools to highlight the key interactions
in microbial communities, and consequently the genes associated to essential
metabolic pathways [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This task is usually referred as functional analysis of
microbial genetic networks and most of the available pipelines deal with a list
of microbial genes rather than interactions. Thus, the genomics highlight the
“static” view of the genes available in a metagenome, but the interaction as
well as the function that will be performed must be evaluated by an enrichment
analysis over a proper database of metabolic pathways such as KEEG.
      </p>
      <p>Metagenomics data analysis poses challenges that could be handled by the
utilization of Machine Learning (ML) techniques. In fact, ML has been applied
succesfully in several genomics problems. In the context of functional analysis it
can provide new ways to explore graphs by using robust statistics, dealing with
uncertainty in the data and boosting the search for "hot spots" in large microbial
genetic networks.</p>
      <p>In this work we propose a computational pipeline to evaluate functional
enrichment of microbial genetic networks. A weighted graph is built with its basis
on the genes and pathways properly induced from the relative abundance of the
metabolic pathways enriched by the associated metagenomic data. In addition,
non-supervised ML is applied to enumerate network components (clusters) of
microbial genes presenting strong evidence of both interaction and functional
enrichment.</p>
      <p>The main contribution of the proposed strategy are:
– A functional enrichment analysis which takes into account microbial gene
interactions;
– A new visual analytics system to explore interactively the enriched metabolic
pathways in microbial genetic networks;
– the FUNN-MG R pipeline for the identification of network components
(clusters) having strong functional enrichment in microbial communities.
2</p>
      <p>Metagenomic pathway-centric network analysis
Metagenomic data analysis is a complex analytical tasks in both biological and
computational senses. In sequence-based metagenomics, researchers focus on
finding the entire genetic sequence, the pattern of the four di erent nucleotide
bases (A, C, G, and T) in the DNA strands found in a sample. The sequence
can then be analyzed in many di erent ways. For instance, researchers can use
the sequence to analyze the genome of the community as a whole, which can
o er insights about population ecology, evolution and functioning. In this work,
we propose the FUNN-MG pipeline (Figure 1) which provides a functional and
visual analytic system for the identification and exploration of the key functions
of a microbial community.</p>
      <p>The pipeline has four main tasks (the rounded rectangles in Figure 1) that
must be executed sequentially: i) identification of the metabolic pathways, ii)
evaluation of the enriched pathways, iii) detection of strong components
(clusters) and iv) visualization of the microbial gene-pathway network. The first three
steps are related to the ML part of the strategy while the remaining step deals
with the visual analytics of the graph patterns extracted in the previous steps.
Next section we discuss each one of these steps, leaving one particular section
to the visualization strategy.</p>
      <p>Materials and Methods</p>
      <p>
        The metagenomic experimental data
The metagenomic data selected for our experimental study is the Acid Mine
Drainage (AMD) biofilm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], freely available at the site of NCBI 2. This biofilm
sequencing project was designed to explore the distribution and diversity of
metabolic pathways in acidophilic biofilms. Acidophilic biofilms are self-sustaining
communities that grow in the deep subsurface and receive no significant inputs
of fixed carbon or nitrogen from external sources. While some AMD is caused
by the oxidization of rocks rich in sulfide minerals, this is a very slow process
2 http://www.ncbi.nlm.nih.gov/books/NBK6860/
and most AMD is due directly to microbial activity. The AMD metagenome was
assembled into 2425 contigs distributed along five main species (see Table 1).
      </p>
      <p>
        More information regarding the AMD study as well as environmental
sequences, metadata and analysis can be obtained at [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Species name
Ferroplasma acidarmanus Type I
Ferroplasma sp. Type II
Leptospirillum sp. Group II 5-way CG
Leptospirillum sp. Group III
Thermoplasmatales archaeon Gpl</p>
      <p>Number of contigs</p>
      <p>
        Preprocessing of the metagenomic sequences
We have used the KAAS tool [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for the identification of 477 microbial genes.
This identification was based on the nucleotide percent homology of the groups
of orthologous genes 3 found in the KEGG database [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        The search for microbial genes was carried out in several steps. First, the
metagenomic data was split into several groups accordingly to (Table 1), followed
by a validation stage of each group within the corresponding species in the KEGG
database [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. KAAS tool was employed sequentially in four steps (Table 2) to
obtain the final set of 477 genes:
– Step 1, finding groups of orthologous genes: for each specie in the
AMD sample we search all its orthologous genes in the KEGG database. For
example, the AMD species Ferroplasma acidarmanus Type I and Type II are
named in KEGG as Ferroplasma acidarmanus. So, we use the 530 contigs
of the associated AMD species as a reference into the KASS tool, retrieving
290 orthologous genes;
– Step 2, identifying associated species in KEGG: it basically filters out
orthologous genes that are not associated to the reference species. Taking
the previous example in Step 1 only 226 genes were kept for the Ferroplasma
acidarmanus species;
– Step 3, getting functional annotation in KEGG: it retrieves the genes
associated to pathways in KEGG by using the gene list obtained in Step
2. For instance, 149 genes were retrieved for the Ferroplasma acidarmanus
specie;
– Step 4, eliminating duplicated genes: since pathways are usually
associated to one or more genes we deduplicate these genes found in Step 3. So,
for the Ferroplasma acidarmanus specie we obtained 119 genes.
3 Orthologous genes are genes in di erent species that originated by vertical descent
from a single gene of the last common ancestor (Homology section on Wikepedia)
All the steps above were executed for all reference species in the AMD
sample, taking into account its associated target species in the KEGG database. In
(Figure 2) we present this association as well as the distribution of the genes
found in the related metagenome.
      </p>
      <p>Id Species identified</p>
      <p>Step 1 Step 2 Step 3 Step 4
fac Ferroplasma acidarmanus 290
lfc Leptospirillum ferrooxidans 450
lfi Leptospirillum ferriphilum 44
tac Thermoplasma acidophilum 26
tar Thermoplasmatales archaeon 412
tvo Thermoplasma volcanium 11</p>
      <p>
        Genes 1233
The “KEGGREST ” R package [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] was applied using as reference the list of 477
genes identified, highlighting 95 pathways for the AMD metagenome. Though
at this step we cannot assume any strong evidence of functional enrichment
regarding to the genes identified.
      </p>
      <p>
        Functional enrichment analysis
We devised a functional enrichment strategy based on [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], in which contigency
tables are properly set to further apply Fisher’s exact test for statistical
significance of the enriched metabolic pathways. Fisher’s exact test4 is one of a
class of exact tests, so called because the significance of the deviation from a
null hypothesis (e.g.: P-value) can be calculated exactly, rather than relying on
an approximation that becomes exact in the limit as the sample size grows to
infinity, as with many statistical tests.
      </p>
      <p>The main challenge in evaluating the enrichment of a metabolic pathways
is the calculation of the probability of finding species covered on each pathway
across samples, given that, eventually, only a selected group of species will have
an associated pathway. This is also due to the fact that species play distinct roles
in the microbial community. As an example, the metabolic pathway Glutathione
metabolism is annotated for five out of six species identified in the samples
(Table 2): Ferroplasma acidarmanus, Leptospirillum ferrooxidans, Leptospirillum
ferriphilum, Thermoplasma acidophilum e Thermoplasma volcanium. So,
KEGGREST will only take into account these five species for the enrichment score
(Fisher’s exact test).</p>
      <p>Gene associated Gene not associated Total
with a pathway with a pathway gene
Sample
Population
Total in KEGG
a
(6)
c
(15)
a+c
(21)</p>
      <p>b
(364)</p>
      <p>d
(2768)
b+d
(3132)
a+b
(370)
c+d
(2783)</p>
      <p>n
(3153)</p>
      <p>In Table 3 we present the contigency table required to calculate the
enrichment of the Glutathione metabolism pathway with respect to the microbial genes
found in the samples and its corresponding annotations in KEGG. Having this
table, we use the phyper function in the “stats” R package for the enrichment
score, followed by a test of significance using the “Firsher’s exact test for count
data” R package. Finally, we obtained an enrichment score of 0.0077 (p-value =
0.0292) for the the Glutathionemetabolism pathway.</p>
      <p>After completing the functional analysis for the 95 metabolic pathways, we
obtained a list with only 11 enriched pathways (see Table 4) (p-value Æ 0.05)
corresponding to 329 genes. Furthermore, we explore functional modules
presenting strong gene interactions by the utilization of a bipartite graph
structure M GP = (G, P, E). We called this bipartite graph Microbial Gene Pathway
(Figure 3. a). M GP vertices are divided into two disjoint sets (G)enes and
4 http://en.wikipedia.org/wiki/Fishers_exact_test
function Enrichment p.value
Purine metabolism 0.033 0.04
Geraniol degradation 6.95e-05 0.01
Cyanoamino acid metabolism 0.008 0.05
Glutathione metabolism 0.007 0.02
Porphyrin and chlorophyll metabolism 0.023 0.03
Metabolic pathways 0.0002 0.0003
Microbial metabolism in diverse environments 0.042 0.05
Carbon metabolism 0.039 0.05
Biosynthesis of amino acids 0.017 0.02
RNA degradation 0.01 0.03
Nucleotide excision repair 0.01 0.03
(P )athways, such that every edge (E) connects a vertex in (G) to one in (P ).
The enrichment score is annotated in the vertice (P ).
3.5</p>
      <p>Finding gene clusters
Several groups of genes interact in microbial communities, and some of these
interaction are stronger than others. In addition, these interactions usually
correlate to the environment in which they are living. We called these strong gene
interactions community patterns, and potentially they may play a key role in
the stability of the microbial genetic network. We have a hypotheses that any
perturbation in such patterns could impact directly in the maintenance of the
network. We propose a structural graph clustering strategy which takes into
account a bipartite graph (M GP ).</p>
      <p>The structural graph clustering uses a community matrix (Figure 3.b) based
on the genes and its enriched pathways represented in M GP . The community
matrix observes three main aspects regarding gene-to-gene interactions:
– The existence of one or more metabolic pathways shared by the genes;
– The amount of metabolic pathways in which genes play;
– The enrichment score associated to each metabolic pathway.</p>
      <p>
        The M GP bi-partite graph is an interesting computational structure for both
the application of ML techniques and interactive visualization of the microbial
genetic network [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The community patterns are obtained directly through the
utilization of a hierarchical clustering (hclust() R function) technique over the
community matrix. The hierarchical clustering solution (Figure 4) requires an
euclidean distance matrix that can be built directly through the community
matrix. From a biological perspective, the identification of these strong interactions
allows for a better understanding of the mechanisms by which these complex
networks control cell processes, making it possible to interfere in such processes
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        The branches of the hierarchical clustering dendrogram correspond to
community patterns and can be identified using one of a number of available branch
cutting methods, for example the constant-height cut or two Dynamic Branch
Cut methods. One drawback of hierarchical clustering is that it can be di cult
to determine how many (if any) clusters are present in the data set. We employed
the Dynamic Tree Cut R package to obtain robust clusters [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Although the
height and shape parameters of the Dynamic Tree Cut method provides
improved exibility for branch cutting and module detection, it remains an open
research question how to choose optimal cutting parameters or how to estimate
the number of clusters in the data set. Two cutting strategies were explored with
the Dynamic Tree Cut:
– Dynamic tree: the algorithm implements an adaptive, iterative process of
cluster decomposition and combination and stops when the number of
clusters becomes stable. To avoid over-splitting, very small clusters are joined
to their neighboring major clusters;
– Dynamic hybrid: the algorithm can be considered a hybrid of hierarchical
clustering and modified Partitioning Around Medoids (PAM), since it
involves assigning objects to their closest medoids.
      </p>
      <p>Given that we were looking for compact clusters we decided to use the cutting
result obtained with the Dynamic hybrid approach. Thus, 9 clusters and 10
nested subclusters were enumerated. All clusters have the prefix “NT" followed
by a sequential number (Table 5). The nested subclusters were calculated with
the guide of the RedeR R package, and it o ered an interesting alternative for
the interactive visualization of the microbial genetic networks.</p>
      <p>In summary, 308 genes were clustered, corresponding to 96.61% of the
enriched pathways related to AMD biofilm. These clusters enclose on average 30
genes, having 6 genes in the most compact cluster and 128 in the largest one.
Next, we explore the visual analytic systems over the M GP bipartite graph
allowing free manipulation of the community patterns as well as the exploration
of key hub genes and pathways inside this microbial network.
4
4.1</p>
      <p>Results and discussion</p>
      <p>
        Visual analytics system
Given the linked information associated with the concept of microbial
communities, it is strongly advised to explore it by graph visualization [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The M GP
bipartite graph fits properly the graph structure required for visualization by
the RedeR R package. This network visualization system allows several
interactive and graph functions such as: zoom, pan, neighborhood highlighting, search,
flows, labeling, addition and deletion of graph components.
      </p>
      <p>The structural visualization of the enriched Microbial Gene Pathway is
presented in Figure 5. The visualization model allows the identification of genes
across species and pathways, depicted in distinct colors. It is also possible to
explore the degree of connectivity by inspecting the size of the vertices; key players
are identified by neighborhood highlighting while clicking on a particular node
in the graph network. Such interactive experience allows one to explore resilient
aspects of the enriched microbial gene pathway.</p>
      <p>The community patterns are explored through the visualization of the graph
components associated with the clusters and subclusters (Figure 6).
Furthermore, it is also possible to inspect particular spots as well as identify either hub
genes, modules or pathways within the network. As an example, the modules
are explored as (nested) clusters detected by the proposed pipeline. The
Subgroup row in Table 5 identifies these nested clusters. Thus, if one looks to the
Group “NT2” we observe a total of 22 genes distributed along the six species
(The headers previously described above). NT1 is an example of nested cluster
having 1 gene plus 22 genes from NT2, summing up to a total of 23 genes. The
symbol “–” shows the there is no nested cluster for that Group.</p>
      <p>Fig. 5. The enriched Microbial Gene Pathway Network. At the bottom left the legend
of the species and associated pathways are represented. Nodes (circles) are related to
either species genes or pathways. At the upper left the degree connectivity scale of
all nodes. The nodes in highlighting (yellow) are all genes associated to the Carbon
metabolism pathway (direct orange arrow).</p>
      <p>As an illustration of the visualization, the nested cluster “NT3” having 30
genes is depicted in the middle of Figure 6. As it can be observed the nested
cluster “NT3” has 1 gene plus the 29 genes (from “NT4”). The most abundant
specie is the “lfc” (colored in brown). Finally, it is presented the eleven enriched
pathways (colored in green) connecting all the enumerated nested clusters.
5</p>
    </sec>
    <sec id="sec-2">
      <title>Conclusions</title>
      <p>The enrichment analysis of microbial genetic networks poses an interesting
computational challenge. It is not practical to enumerate all gene-to-gene interaction
of a microbial community, so the pathway-centric analysis sound a promising
strategy to smooth this combinatorial problem. This strategy has it basis on
non-supervised machine learning over a bipartite graph properly built to
evaluate the enriched microbial gene pathways.</p>
      <p>Interactive visualization of the resulting microbial gene pathway networks
allows for the exploration of network metrics enhancing the enrichment
analysis. Once all the topological network aspects are understood for a particular
metagenome, we envisage the possibility of using such profiles for metagenome
comparison as well as classification of unknown microbial genetic network.</p>
    </sec>
    <sec id="sec-3">
      <title>Author’s contributions</title>
      <p>LC and RA performed the analysis and developped the pipeline. RA and CC
supervised the study. LC, RA, CC and LT wrote the manuscript.
This work is partially supported by the Brazilian National Research Council
(CNPq – Universal calls) under the BIOFLOWS project [475620/2012-7].</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hugenholtz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tyson</surname>
            ,
            <given-names>G.W.</given-names>
          </string-name>
          : Microbiology: Metagenomics.
          <source>Nature</source>
          <volume>455</volume>
          (
          <issue>7212</issue>
          ) (
          <year>September 2008</year>
          )
          <fpage>481</fpage>
          -
          <lpage>483</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Fierer</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>B.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nielsen</surname>
            ,
            <given-names>U.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bates</surname>
            ,
            <given-names>S.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lauber</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Owens</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gilbert</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wall</surname>
            ,
            <given-names>D.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caporaso</surname>
            ,
            <given-names>J.G.</given-names>
          </string-name>
          :
          <article-title>Cross-biome metagenomic analyses of soil microbial communities and their functional attributes</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>109</volume>
          (
          <issue>52</issue>
          ) (
          <year>December 2012</year>
          )
          <fpage>21390</fpage>
          -
          <lpage>21395</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bäckhed</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ley</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sonnenburg</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peterson</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gordon</surname>
            ,
            <given-names>J.I.</given-names>
          </string-name>
          :
          <article-title>HostBacterial Mutualism in the Human Intestine</article-title>
          .
          <source>Science</source>
          <volume>307</volume>
          (
          <issue>5717</issue>
          ) (
          <year>March 2005</year>
          )
          <fpage>1915</fpage>
          -
          <lpage>1920</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Johnston</surname>
            ,
            <given-names>C.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wyatt</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ibrahim</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shuster</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Southam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magarvey</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>Gold biomineralization by a metallophore from a gold-associated microbe</article-title>
          .
          <source>Nat Chem Biol advance online publication (February</source>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Wooley</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Godzik</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedberg</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>A Primer on Metagenomics</article-title>
          .
          <source>PLoS Comput Biol</source>
          <volume>6</volume>
          (
          <issue>2</issue>
          ) (
          <year>February 2010</year>
          ) e1000667+
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. NCBI:
          <article-title>Metagenomics: Sequences from the environment [internet]. Sequences from the Environment</article-title>
          ,
          <source>Tyson</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Tyson</surname>
            ,
            <given-names>G.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hugenholtz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allen</surname>
            ,
            <given-names>E.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ram</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>P.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solovyev</surname>
            ,
            <given-names>V.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rubin</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rokhsar</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banfield</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>Community structure and metabolism through reconstruction of microbial genomes from the environment</article-title>
          .
          <source>Nature</source>
          <volume>428</volume>
          (
          <issue>6978</issue>
          ) (
          <year>March 2004</year>
          )
          <fpage>37</fpage>
          -
          <lpage>43</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Moriya</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Itoh</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Okuda</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoshizawa</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanehisa</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>KAAS: an automatic genome annotation and pathway reconstruction server</article-title>
          .
          <source>Nucleic acids research</source>
          35(
          <issue>Web Server issue</issue>
          ) (
          <year>July 2007</year>
          )
          <fpage>W182</fpage>
          -
          <lpage>W185</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kanehisa</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goto</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sato</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furumichi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanabe</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>KEGG for integration and interpretation of large-scale molecular data sets</article-title>
          .
          <source>Nucleic acids research</source>
          40(Database issue) (
          <year>January 2012</year>
          )
          <fpage>D109</fpage>
          -
          <lpage>D114</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Tenenbaum</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : KEGGREST:
          <article-title>Client-side REST access to KEGG</article-title>
          .
          <source>R package version 1.0.1.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sreenivasaiah</surname>
            ,
            <given-names>P.K.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cayetano</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arul</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>D.H.o..H.</given-names>
          </string-name>
          :
          <article-title>IPAVS: Integrated Pathway Resources, Analysis and Visualization System</article-title>
          .
          <source>Nucleic acids research</source>
          40(Database issue) (
          <year>January 2012</year>
          )
          <fpage>D803</fpage>
          -
          <lpage>D808</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Goh</surname>
            ,
            <given-names>K.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cusick</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Valle</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Childs</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barabási</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          :
          <article-title>The human disease network</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>104</volume>
          (
          <issue>21</issue>
          ) (May
          <year>2007</year>
          )
          <fpage>8685</fpage>
          -
          <lpage>8690</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>A.L. LEHNINGER</surname>
            , N.,
            <given-names>D.</given-names>
          </string-name>
          <article-title>L: Principios da bioquímica. 5 edn</article-title>
          . Volume
          <volume>1</volume>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Langfelder</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horvath</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Defining clusters from a hierarchical cluster tree: the dynamic tree cut package for r</article-title>
          .
          <source>Bioinformatics</source>
          <volume>24</volume>
          (
          <issue>5</issue>
          ) (
          <year>2008</year>
          )
          <fpage>719</fpage>
          -
          <lpage>720</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Herman</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Melancon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marshall</surname>
            ,
            <given-names>M.S.:</given-names>
          </string-name>
          <article-title>Graph visualization and navigation in information visualization: A survey</article-title>
          .
          <source>Visualization and Computer Graphics, IEEE Transactions on 6(1)</source>
          (
          <year>January 2000</year>
          )
          <fpage>24</fpage>
          -
          <lpage>43</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>