<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Unusual Distribution Structure of the Cyanobacteria Photosystem Genes in the Frequency Space of Triplets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maria Yu. Senashova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computational Modelling of the Siberian Branch of the Russian Academy of Sciences</institution>
          ,
          <addr-line>50/44 Akademgorodok, Krasnoyarsk, 660036</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>25</volume>
      <issue>2021</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Genes of the photosynthetic systems I and II are isolated for 45 cyanobacteria genomes. A frequency dictionary is built for each gene, to which a point in the 64-dimensional space of triplets is assigned. The photosystem gene structure in this space is considered. The genes are found to be clustered, depending on their belonging to the forward and reverse strands. Moreover, the points belonging to the forward and reverse strands form two perpendicular planes. The genes are grouped according to the type of bacteria within the main clusters. The values of the gene GC-content are distributed along the gradient.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Order</kwd>
        <kwd>distribution</kwd>
        <kwd>clustering</kwd>
        <kwd>evolution</kwd>
        <kwd>triplets</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Photosynthesis, the conversion of solar energy into biomass, is one of the most fundamental
processes on the Earth. Only photoautotrophic organisms such as cyanobacteria and plants can use
photons to break down water molecules into hydrogen and molecular oxygen. The photosynthetic
system of cyanobacteria, in contrast to purple and green bacteria, consists of two subsystems:
photosystem I and photosystem II. The functions of these systems are mutually complementary. The
primary function of photosystem II is to generate a strong oxidant, which initiates the oxidation of
water and transfers its electrons to a membrane carrier. The primary function of photosystem I is to
saturate these low-level electrons with energy in order to reduce NADP+. Since the energy of the total
process is too great within the framework of one reaction center, there appeared two photosystems in
the course of evolution, where different parts of this reaction occur. Their specific functions determine
the features of their structure. Thus, photosystem I is symmetric, i.e. there two branches of electron
transport, which makes it much faster. In contrast, photosystem II is asymmetric and has only one
working branch, which slows down the transport of electrons, but makes it more controllable.
Recently, significant progress has been made in determining the spatial structures of the photosystem
of various cyanobacteria. However, they continue to attract the interest of researchers. The natural
location of photosystem I (PSI), photosystem II (PSII), cytochrome (Cyt) b6f, and ATP synthase
within thylakoid membranes at the molecular level was visualized. An inhomogeneous distribution of
these four photosynthetic complexes was revealed and their dynamic features in a dense membrane
environment were determined [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Structural studies of PS II of cyanobacteria led to the creation of a
high-resolution spatial model of this huge complex. It was shown that the monomer of the
photocomplex consists of 20 protein subunits, 54 pigment molecules, and 25 molecules of
incorporated lipids. Mechanisms of operation of mobile electron carriers were proposed based on the
structural data. The system for matching cluster atoms to the protein environment was described in
detail [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. New biochemical separation methods and structural characteristics of intermediate PSII
complexes provided new insight into their protein composition and the spatial distribution of these
complexes in the cell. The idea of the coordination of protein bonds and process of the PSII assembly
was presented [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The role of the PsbU gene included in PSII was considered. It was found to be
crucial for the stable architecture of the water-splitting system of the water separation system, which
optimizes the efficiency of the oxygen production process [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For the structure of PSI and PSII of
cyanobacteria, a comparison was made with the photosynthetic systems of higher plants, and
assumptions on the evolution of the photosystem were made [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The functions and structure of the
Psb27, Psb28, and Ycf48 hydrophilic assembly factors were discussed using structural, biochemical,
and physiological information. A review was made of the role of these protein factors in the
cyanobacterial assemblies of PSII, emphasizing their participation both in biogenesis and restoration
of the photosystem from photodamage [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The new structures of PSI and PSII of cyanobacteria,
algae, and plants shed light on the architecture and mechanism of the action of these complex
membrane complexes and on the evolutionary forces shaping oxygen photosynthesis [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. A review of
the general structure of PSII was presented, followed by a detailed description of the specific structure
of the catalytic center for water oxidation of the Mn4CaO5 cluster and its protein environment [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. A
comparison was made between phycobilisomes and PSII dimers. The most probable locations of
terminal emitter subunits ApcD and ApcE inside lower cylinders of the PBS nucleus were
determined, and chlorophyll PSII molecules collecting energy from PBS were identified [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In all
these studies, the structure of the photosystem of cyanobacteria is considered from the viewpoint of
biophysics and biochemistry. In the present work, the structure of the photosystem of cyanobacteria is
considered from the viewpoint of bioinformatics, as the clustering of points is related to the genes of
the photosystem in the frequency space of triplets.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Material and methods</title>
      <p>In the present study, frequency dictionaries are used to represent genes as points in the frequency
space of triplets. Let us describe in more detail the procedure for constructing dictionaries and
identifying their structuredness. The genetic sequences of the length L , consisting of the alphabet
symbols   A,C,G,T , are considered. In our case, the genes of the photosystem act as genetic
sequences. For each of the sequences, a frequency dictionary of thickness 3 is compiled. The
frequency dictionary of thickness 3 implies a list of all the triplets   1 2 3 of consecutive
nucleotides with an indication of the frequencies of these triplets. There can be 64 triplets in total. The
frequency f is the ratio of the number of copies n of a given triplet to the total number of all the
triplets N , where N is the sum of all n :
f  n</p>
      <p>N</p>
      <p>The triplets within a fragment are defined as follows: they do not intersect, but the combination of
all the triplets completely covers the entire sequence in the gene. In other words, the reading frame in
the construction of the dictionary is shifted by three nucleotides instead of one. In this case, the
dictionary specifies the mapping of the genome into a 64-dimensional metric space. Two genes are
considered to be close if the corresponding points in the 64-dimensional space are close in the sense
of the Euclidean metric.</p>
      <p>Thus, each gene is assigned a point in the 64-dimensional space of triplets. The following
parameters are associated with each point: the name of the gene, the name of the species to which the
gene belongs, the gene strand type (forward or reverse), the GC - content of the gene. The data view is
built in the space of the first three principal components, calculated for the 64-dimensional space of
triplets based on the obtained set of points in the VidaExpert program
(http://bioinfoout.curie.fr/projects/vidaexpert/). The projections of the space on the plane of the first and second as
well as second and third principal components are considered.
(1)
3. Results and discussion</p>
      <p>45 genomes of cyanobacteria from the EMBL bank have been examined, and the genes of the
photosynthetic system have been annotated. The genes belonging to the forward and reverse strand
have been found to form two clusters similar in the number of points belonging to them. That is, the
number of the genes in the forward and reverse strand is approximately equal. Moreover, the points
belonging to the same strand can be well approximated by an embedded plane. In addition, the planes
corresponding to the points of each strand are perpendicular (Figure 1).</p>
      <p>At the same time, the genes of the same type do not form dense clusters, but they are extended
along with the clusters of the forward and reverse strands (Figure 2).</p>
      <p>
        On the contrary, the genes belonging to cyanobacteria of the same species form clusters within the
strands. Figure 3 shows the genes related to Gloeobacter kilaueensis (ID in the EMBL-bank
CP003587) and Gloeobacter violaceus (BA000045), are denoted by crimson, Synechocystis
(AP012205, AP012276, AP012277, AP012278, BA000022, CP003265) are marked in purple, Nostoc
(CP003552, BA000019, CP003548, CP001037) are indicated in turquoise, Prochlorococcus marinus
(CP000551, CP00552, CP000553) are indicated in light green.
The distribution of the values of the gene GC -content has also been analyzed in the space of the first
three principal components. It has been found that the values of the gene GC -content are located in
the ascending order from the bottom (points indicated in green) upwards (points indicated in red)
(Figure 4). The average value points are marked in yellow. There is a gradient distribution in the
space of the frequencies of triplets for the values of the gene GC -content in the cyanobacteria
photosystem. This type of distribution is common in complete genomes. In particular, it is found in
the genomes of chloroplasts [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], in the genomes of mitochondria of higher plants, algae, mosses,
lichens, fungi [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and in GC -rich bacteria.
      </p>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion</title>
      <p>The spatial structure of the genes in the photosynthetic systems of cyanobacteria in the space of
triplet frequencies is similar to the structures previously studied for the complete genomes of
chloroplasts, mitochondria, and bacteria in terms of the spatial distribution of the values of the gene
GC -content. However, the difference from the spatial structure of the complete genomes is that a
pronounced clustering of the genes of the forward and reverse strands is observed for the genes of the
cyanobacteria photosystem. In addition, the points have been found to be grouped in the frequency
space of triplets according to the types of organisms corresponding to them rather than according to
the type of genes.</p>
    </sec>
    <sec id="sec-4">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Casella</surname>
          </string-name>
          et al.,
          <article-title>Dissecting the native architecture and dynamics of cyanobacterial photosynthetic machinery</article-title>
          ,
          <source>Molecular plant 10(11)</source>
          (
          <year>2017</year>
          )
          <fpage>1434</fpage>
          -
          <lpage>1448</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.molp.
          <year>2017</year>
          .
          <volume>09</volume>
          .019.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.G.</given-names>
            <surname>Gabdulkhakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Dontsova</surname>
          </string-name>
          ,
          <article-title>Structural studies on photosystem II of cyanobacteria</article-title>
          ,
          <source>Biochemistry</source>
          <volume>78</volume>
          (
          <issue>13</issue>
          ) (
          <year>2013</year>
          )
          <fpage>1524</fpage>
          -
          <lpage>1538</lpage>
          . doi:
          <volume>10</volume>
          .1134/S0006297913130105.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Heinz</surname>
          </string-name>
          et al.,
          <article-title>Analysis of photosystem II biogenesis in cyanobacteria</article-title>
          , Biochimica Et Biophysica
          <string-name>
            <surname>Acta (BBA)-Bioenergetics</surname>
          </string-name>
          .
          <year>1857</year>
          (
          <article-title>3) (</article-title>
          <year>2016</year>
          )
          <fpage>274</fpage>
          -
          <lpage>287</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.bbabio.
          <year>2015</year>
          .
          <volume>11</volume>
          .007.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Inoue-Kashino</surname>
          </string-name>
          et al.,
          <article-title>PsbU provides a stable architecture for the oxygen-evolving system in cyanobacterial photosystem II, Biochemistry</article-title>
          .
          <volume>44</volume>
          (
          <issue>36</issue>
          ),
          <fpage>12214</fpage>
          -
          <lpage>12228</lpage>
          (
          <year>2005</year>
          ). doi:
          <volume>10</volume>
          .1021/bi047539k.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Karapetyan</surname>
          </string-name>
          ,
          <article-title>Photosystem I of cyanobacteria: organization and functions</article-title>
          ,
          <source>Advances in biological chemistry 41</source>
          (
          <year>2001</year>
          )
          <fpage>39</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kosarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Senashova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sadovsky</surname>
          </string-name>
          , Intrinsic Structuredness of Mitochondria Genomes, in: L.
          <string-name>
            <surname>Nozhenkova</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Penkova</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Korobko (Eds.),
          <source>The 1st Siberian Scientific Workshop on Data Analysis Technologies with Applications</source>
          <year>2020</year>
          , volume
          <volume>2727</volume>
          of SibDATA'20,
          <string-name>
            <surname>CEUR</surname>
          </string-name>
          ,
          <year>2020</year>
          , Krasnoyarsk, Russia, pp.
          <fpage>66</fpage>
          -
          <lpage>74</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P. D.</given-names>
            <surname>Mabbitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Wilbanks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Eaton-Rye</surname>
          </string-name>
          ,
          <article-title>Structure and function of the hydrophilic Photosystem II assembly proteins: Psb27, Psb28 and Ycf48</article-title>
          ,
          <source>Plant Physiology and Biochemistry</source>
          <volume>81</volume>
          (
          <year>2014</year>
          )
          <fpage>96</fpage>
          -
          <lpage>107</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.plaphy.
          <year>2014</year>
          .
          <volume>02</volume>
          .013.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Nelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Yocum</surname>
          </string-name>
          ,
          <article-title>Structure and function of photosystems I and II, Annu</article-title>
          .
          <source>Rev. Plant Biol</source>
          .
          <volume>57</volume>
          (
          <year>2006</year>
          )
          <fpage>521</fpage>
          -
          <lpage>565</lpage>
          . doi:
          <volume>10</volume>
          .1146/annurev.arplant.
          <volume>57</volume>
          .032905.105350.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Sadovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yu. Senashova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Malyshev</surname>
          </string-name>
          ,
          <article-title>Amazing symmetrical clustering in chloroplast genomes</article-title>
          ,
          <source>BMC Bioinformatics 21(Suppl</source>
          <volume>2</volume>
          )
          <fpage>83</fpage>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1186/s12859-020- 3350-z.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>The structure of photosystem II and the mechanism of water oxidation in photosynthesis, Annual review of plant biology 66 (</article-title>
          <year>2015</year>
          )
          <fpage>23</fpage>
          -
          <lpage>48</lpage>
          . doi:
          <volume>10</volume>
          .1146/annurev-arplant050312-
          <volume>120129</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Zlenko</surname>
          </string-name>
          et al.,
          <article-title>Coupled rows of PBS cores and PSII dimers in cyanobacteria: symmetry and structure</article-title>
          ,
          <source>Photosynthesis research 133(1)</source>
          (
          <year>2017</year>
          )
          <fpage>245</fpage>
          -
          <lpage>260</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11120-017-0362-2.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>