<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Genesy: a Blockchain-based Platform for DNA Sequencing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roberto Carlini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Carlini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Dalla Palma</string-name>
          <email>s.dallapalma@uvt.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Remo Pareschi</string-name>
          <email>remo.pareschi@unimol.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Genesy Project</institution>
          ,
          <addr-line>Ferrara, Emilia Romagna 44121</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Jheronimus Academy of Data Science (JADS)</institution>
          ,
          <addr-line>Sint Janssingel 92 5211 DA 's-Hertogenbosch</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Molise</institution>
          ,
          <addr-line>C.da Fonte Lappone 86090 Pesche (IS)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Advances in technology have drastically reduced costs and implementation time for Whole Genome Sequencing (WGS). Along with easy access to genomic data, WGS technology can signi cantly improve the productivity and e ciency of health care and social well-being as well as improving the quality of life and increasing the possibility that people have a direct impact on their health. This paper proposes Genesy, an innovative blockchain platform which acts as an intermediary between the owner of the genomic data and its potential users to structure a new ecosystem that incentivize people to share their genomic data, leveraging the potentiality of blockchain technology to safekeep the users' exclusive property and access to their genomic data, allowing them to partecipate in the bene ts and the advances of the genomic research.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>information of enormous general and clinical interest. Genotyping is the most economical DNA
sequencing service: genes are compared to a standard reference genome of 500,000 to a million
di erent points to identify the variants. It can explain why an individual have certain somatic
traits, shed light on her ancestors as well as reveal some medical risks. Exome sequencing is
considerably more detailed and can give important medical scienti c information.</p>
      <p>Reduced costs and time along with the rise of new digital technologies are profoundly
changing the way we can manage knowledge in health-care and are allowing us to have data and the
possibility to link data and knowledge that have so far been impracticable. At the same time,
they are also raising new challenges on privacy and data security, which are important issues
that often come to discourage the adoption or the development of certain projects.</p>
      <p>To address those challenges we propose Genesy, an innovative blockchain platform which
acts as an intermediary between the owner of the genomic data and its potential users (i.e.,
university research centers, private laboratories, hospitals, geneticists, etc.). Our proposal
envisions the use of blockchain, cloud computing and arti cial intelligence as means to structure
a new ecosystem that ensures the user's exclusive property and access to their genomic data,
but also the possibility of participating in the bene ts of the genomic research.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Genesy Ecosystem: How Does it Work?</title>
      <p>The aim of Genesy is to involve collaboration among users and various organizations to promote
a high-level genomic ecosystem, thereby e ciently collecting and managing the large volumes
of data produced in sequencing activities. To this aim the following components have been
proposed.</p>
      <p>Kit for DNA collection. It consists of a small, light and thin vial that will preserve dry
saliva intact for months with reduced logistics costs. It will be delivered to the customer
in a simple personalized packaging with our brand. The vials will be regularly insured
and sent by batch to the sequencing centers, containing even more shipping costs. Users
of the platform will receive their results within a few weeks. A service will be activated
that will regularly notify the user of the possibility to acquire new reports applied to his
DNA, which will be produced on a par with the development of new genetic panels and
related medical research progress.</p>
      <p>Sequencing structures. Genesy will sequence DNA at various levels (complete genome
and genetic panels) in Italian and US laboratories through medium-high-end machines,
therefore for areas such as nutrition, allergies, tness, microbiome, etc.; and for diagnostic
panels, for example autoimmune diseases, physical and neuropsychilogical traits.
IBM technology infrastructure. The IBM Hyeperledger blockchain platform1 for
managing user meta-data is in turn integrated with the following services:
{ NoSQL database - to analyze and manage sequenced DNA data;
{ IBM Watson - as a tool to create reports on data produced by sequencing;
{ Cloud Object Storage - to store data and some o -genomic data;
{ Network Stellar - to manage interchange transactions through ad-hoc
cryptocurrency.</p>
      <p>1Accessible online at https://www.ibm.com/blockchain/hyperledger
2.1</p>
      <sec id="sec-2-1">
        <title>Design Principles</title>
        <p>The design principles we propose for the platform to act as an intermediary between owners of
genetic data and its users can be exempli ed as follows.</p>
        <p>Design Principle 1 - Whole Genome Sequencing information sharing for custom
recommendations. The value of genomic data lies in the possibility of identifying associations
between genetic variants and diseases. Risk factors and strengths can be identi ed in advance
and then, prevention, diagnosis and treatment of diseases can be targeted to achieve the best
e ect on each individual with a particular genetic makeup. The advantages are two-fold: (1) as
for the national economy, it is about the possibility of signi cantly improving the productivity
and e ciency of health-care and social well-being. Indeed, if genomic data will be shared
with researchers, it can help to identify the causes of multiple diseases and contribute to the
development of new drugs; (2) on an individual level, it means improving the quality of life and
increasing the possibility that people have a direct impact on their health.</p>
        <p>Anonymous data will be shared and processed with advanced tools for computer science and
scienti c analysis, also taking advantage of Big Data technologies. This way, access to data is
complete and interpretations of genomic data can be continuously updated. Conditions that
decree the imminent adoption of this approach.</p>
        <p>Design Principle 2 - ecosystem access mechanisms. Users will be able to receive and
share results through controlled access mechanisms and download information from Genesy
locally. Users will analyze and interpret their genomic data in complete autonomy through an
ad-hoc application and a service will regularly notify them of the possibility to acquire new
reports applied to their DNA, which will be produced on a par with the development of new
genetic panels and the progress of medical research. Access to data from users and
professionals/academic or pharmaceutical facilities will take place with identity management guaranteed
by private-public key pair systems and cryptographic functions. In summary, Genesy will act as
an intermediary between the owner of the data and the potential users, but also as an ecosystem
manager, ensuring the ownership and exclusive access of users genomic data.
Design Principle 3 - ecosystem cryptocurrency. A new currency that can be de ned
as a token utility will allow to purchase and sale data on the Stellar network2. This will allow
the Genesy company to monetize the sale of its services, activating an exchange market and
increasing its value depending on the increase of activities by the whole ecosystem. This will
help solve one of the most critical aspects of the genetic data market, that is, how to standardize
information and value.</p>
        <p>All transactions on the platform will be carried out in Genesy. The Genesy coin can be
"exchanged" in di erent ways and for di erent reasons. There are two di erent types of
advantages: (1) at the time the DNA is sequenced, some Genesy coins will be credited to the account
associated with the user pro le; and (2) if the user will share its DNA data on the platform,
s/he will be rewarded with Genesy based on the volume of analysis done by pharmaceutical
companies and research. As a result, the longer the DNA data will share, the higher the pro t
in Genesy will be. These analyses will be measured by internal meters, which will distribute
Genesy coins among the accounts of all users who will share their DNA. This means that even
if the DNA in particular will not be analyzed, it will still earn.</p>
        <p>In addition to the immediate compensation for the sharing of DNA and the future variable
2Accessible online at: https://www.stellar.org/
compensation linked to the analysis performed on the platform, there is the possibility of
attracting the attention of pharmaceutical and research companies interested to certain DNA.
They will never see the name of the user: Genesy will automatically notify the user and let
her/him to accept whether or not to contact these organizations for further analysis and
revenues.</p>
        <p>Design Principle 4 - transparency and privacy. The system will be designed to guarantee
the maximum transparency regarding what will be shared and what will be protected by privacy,
leaving the user the possibility to manage it personally (partially or totally). Users will never
be forced to share their DNA without their consent. The DNA will be stored on our servers and
we will be the only ones able to connect the name of the users to their DNA. However, users
may decide to share it anonymously with external companies, in which case those companies
will be able to query them for in-depth analyses, but ownership will remain of the users.</p>
        <p>Furthermore, the system will be implemented and bound to follow the provisions of the
"Global Alliance for Genomics and Health" for responsible sharing of genomic and health data,
so to minimize damages from data sharing and maximize bene ts for those contributing with
their own genome, but also for societies and health systems as a whole.</p>
        <p>In summary, our proposal aims to build a new genomic ecosystem based on the belief that all
people should (i) have possession and free access to their genomic and health data; (ii) be able
to control who accesses their data; (iii) be sure that the genome is safely stored; (iv) be able to
improve their health in the future using their data; (v) have the opportunity to anonymously
donate their data for the public good; and (vi) be able to bene t economically from the use of
their genome by third parties.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Architecture Elements</title>
        <p>The proposed platform provides that genome owners maintain ownership of sequenced data by
exploiting the security and immutability features o ered by blockchain technology, and that
they are basically accessible.</p>
        <p>The software architecture of the proposed solution entails a Genome-as-a-service
architectural style wherefore a genome data marketed through the proposed ecosystem, that is, for a
sequencing service (e.g., to gather information on the risk of a person to contract a speci c
disease), it is addressed through the marketplace itself, and worked out by peers (i.e.,
universities research centers, private labs, hospitals, geneticists, pharmaceutical companies, etc.),
but, at the same time, the genome data is also sold through the same marketplace. Therefore,
whenever someone buys genome data, s/he buys it as \a service", meaning that s/he gets that
information (i.e., the genome data) following the pay-per-use schema, while the ownership of
the data still belongs to its original producer.</p>
        <p>The aforementioned architectural style requires the architecture elements listed below.</p>
        <p>Blockchain node - is any node that contain public information about genomimc data
and users operating on the platform (e.g., humans, hospitals, research centers, etc.). A
blockchain node also contains information about users transactions, for example when
they exchange coins or share information through smart contracts.</p>
        <p>Ecosystem user - customers, scholars, geneticists, pharmaceutical companies, private
labs etc.
Database - in which to store the results of the analysis and management of sequenced
DNA as well as o -genomic data such as user information.</p>
        <p>Smart contract - Transactions among "ecosystem users" will be regulated by smart
contract. Transactions and meta-data will be encrypted and stored on the blockchain,
ensuring immutability and security. The IBM's Hyperledger Fabric3 is the ideal tool to
achieve the levels of reliability and security we require. The blockchain solution on IBM
Hyperledger technology provides very high standards of security and reliability, allowing
us also to integrate innovative smart contracts and our digital currency.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>The new frontier of medicine is personalized medicine, where doctors will be able to recommend
the most e ective medicines based on our DNA. More and more advanced technologies will make
our DNA a priceless mine of information. We can now analyze our DNA in an a ordable cost,
and soon we will be able to discover more and more about our past, present, and future. The
more DNA will be analyzed, the faster the development will be. We will not only help ourselves,
but we will do our part by contributing to the development of science.</p>
      <p>The solution we outlined in the previous pages combines state of the art blockchain
technologies in a new way to facilitate the acquisition of genomic data and to incentive owners of
genomic data to share them to take advantage both in terms of rewards (through ad-hoc coins)
and health.</p>
      <p>A proof-of-concept of our proposal is currently under way of prototyping. A dedicated
blockchain platform on IBM Fabric Hyperledger has already been implemented, and smart
contracts have been developed for sequencing data acquisition and management procedures.
Cloud o -chain les can be stored on IBM systems and the pipeline (in cloud) that uses raw
DNA data for reporting to users has been completed. In addition, the rst reports were created,
highlighting the rst somatic traits and predisposition to certain diseases and neurological
conditions.</p>
      <p>We are developing and integrating in the platform a series of software applications for the
analysis and management of the raw data coming from the DNA sequencing activities. In
terms of procedures and algorithms for the identi cation of genetic variants and de nition of
gene panels, we have implemented a standard "Genome Analysis Toolkit" environment both
locally and in the cloud, which is adopted by the majority of genetic service providers. We work
in partnership with the National Center for Biotechnology Information genomic data platform
of the US National Institutes of Health and following the guidelines of the American College of
Medical Genetics and Genomics.</p>
      <p>We plan to complete and quickly test the platform for the management and graphical and
tabular display of the users genomic information, as well how to create a network that includes
"interested" geneticists and researchers who can contribute with evaluations or suggestions to
the construction of a medical/scienti c framework for the project. Finally, we also plan to
create an internal encyclopedia based on the most important and reliable genomic databases,
which will allow us to automate the preparation of our reports through the use of new DNA
sequencing algorithms and their impact on research.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>