<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Statistical data governance based on the SDMX</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Haïrou-Dine BIAO K.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emery ASSOGBA</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Engineering and Telecommunications, EPAC / University of Abomey-Calavi</institution>
          ,
          <addr-line>Abomey-Calavi</addr-line>
          ,
          <country country="BJ">Benin</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Statistics are essential for development of a nation. With arise of technologies such as AI and big data eficient data governance become more and more important to overcome challenges and opportunities evolved by them. Unfortunately, most of databases in our public and private companies and organizations lack interoperability. This work proposes a statistical data governance mechanism based on the Statistical Data and Metadata eXchange (SDMX) standard, designed specifically for statistical data sharing and exchange between organizations.We designed and implemented a statistical database based on SDMX. This system allows more than 10 benin public organizations to be able to produce, publish and share statistical data from various theme. They can express the indicators and levels of disaggregation of these indicators in a flexible way, without having to create a new database.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;statistical data</kwd>
        <kwd>database</kwd>
        <kwd>interoperability</kwd>
        <kwd>SDMX</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Today, digitization and increasing information exchange,
statistics play an essential role in the development of
nations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. To be able to get insight from data, data need
to be collected, validated, published and treated. That is
made possible by building database and application over
these databases to access these data. Unfortunately, the
multiplicity of these databases does not allow for eficient
data governance. Because it is more dificult to exchange
and maintain data between diferent systems. This work
consists in setting up a statistical data governance
framework based on the Statistical Data and Metadata eXchange
(SDMX) standard, thus enabling other platforms
implementing this standard to easily consume the data produced by
this database, guaranteeing a high degree of
interoperability and reducing the number of databases needed to collect
statistical data.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and state of the art</title>
      <p>The rapid advent of information technology has led to a
massive explosion of data, creating unprecedented
opportunities, but also posing complex governance challenges.</p>
      <sec id="sec-2-1">
        <title>2.1. Data governance</title>
        <p>
          Data governance is defined as “an overall framework within
the company for assigning rights and duties to decisions in
order to manage data appropriately by as a corporate asset”
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. It is therefore a set of principles designed to manage the
entire data lifecycle, from acquisition to disposal, including
use.
        </p>
        <p>
          Good data governance facilitates exchange and
compatibility between diferent systems and organizations. It thus
promotes greater interoperability.
• Transnational initiatives such as :
– World Bank [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],which is one of the promoters
of data sources,
– the databases of the Food and Agriculture
Organization (FAO) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], which cover a wide
range of topics related to food security and
agriculture. These include :
∗ FAOSTAT, which provides free access
to statistics on food and agriculture
(including crop and livestock sub-sectors,
etc.);
∗ AQUASTAT, which gives users access to
the main database of country statistics,
focusing on water resources, water use
and agricultural water management.
• Continental initiatives such as :
– openAFRICA, an Africa volunteer-driven
open data platform that aims to be the largest
independent repository of open data on the
        </p>
        <p>
          African continent [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
• National initiatives such as :
        </p>
        <p>
          – Benin open data portal [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ];
        </p>
        <sec id="sec-2-1-1">
          <title>2.2.2. Statistical Data and Metadata eXchange (SDMX)</title>
          <p>
            SDMX is an international initiative aimed at standardizing
and modernizing statistical data and metadata exchange
mechanisms. This standard encompasses a data model
(the multidimensional data cube), standard vocabularies
(content-oriented guidelines), a formal schema definition,
and various data serialization formats for building data files
and electronic messages for data exchange. In the SDMX
ecosystem, data providers can choose between diferent data
serialization formats for sharing datasets, including XML,
CSV, JSON, or even EDIFACT [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ].
          </p>
          <p>
            These standards are implemented through new
technologies such as application programming interfaces (APIs).
APIs facilitate interaction between two diferent
applications so that they can communicate with each other. They
act as intermediaries. APIs use the Hypertext Transfer
Protocol (HTTP) for cooperation between diferent programs and
web services (REST or SOAP) [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. They are reusable pieces
of software that enable several applications to interact with
an information system. They ofer machine-to-machine
access to data services and provide a standardized means of
managing security and errors.
          </p>
          <p>APIs are therefore catalysts for interoperability.
However, as their use increases, data security becomes a major
concern.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.3. Data security</title>
        <p>Protecting sensitive data is an important part of data
governance. It involves implementing measures and protocols
to prevent unauthorized access, leakage or manipulation of
confidential information.</p>
        <p>However, despite eforts to ensure data security,
information leaks are still a reality. Due to the rise in API-related
vulnerabilities, the Open Web Application Security Project
(OWASP), a foundation dedicated to improving software
security, has been issuing its list of the top 10 web security
vulnerabilities every 2-3 years since 2003. The OWASP
foundation’s separate classification of the top 10 vulnerabilities
for web applications and APIs highlights the divergence
between modern APIs and traditional web applications,
requiring a tailored security approach.</p>
        <p>
          The OWASP foundation provides a list of the top 10
OWASP API vulnerabilities in 2023 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] :
• API-1 : Broken Object Level Authorization (BOLA)
• API-2 : Broken Authentication
• API-3 : Broken Object Property Level Authorization
• API-4 : Unrestricted Resource Consumption
• API-5 : Broken Function Level Authorization
(BFLA)
• API-6 : Unrestricted Access to Sensitive Business
        </p>
        <p>Flows
• API-7 : Server Side Request Forgery
• API-8 : Security Misconfiguration
• API-9 : Improper Inventory Management
• API-10 : Unsafe Consumption of APIs</p>
        <p>
          To prevent such situations, developers need to focus on
writing secure code and ensuring that APIs are configured
securely. To guarantee API security, it is essential to
consider three fundamental pillars of security : confidentiality,
integrity, and availability [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Material and methodology</title>
      <p>The aim of this initiative is to ensure interoperability
between databases and facilitate the distribution, availability,
use and reuse of information. The same applies to data
security.</p>
      <sec id="sec-3-1">
        <title>3.1. Material</title>
        <p>A set of technological tools including a development
environment, programming languages and software tools was
used.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Methodology</title>
        <p>There are several stages to the process.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Description of the design of conventional statistical database systems</title>
          <p>Setting up a statistical database system involves a series of
steps.</p>
          <p>
            The diagram below summarizes the process :
• Identifying and validating indicators :
indicators are quantitative or qualitative measures
used to assess the performance or state of a
specific domain. They can include statistics such as
the unemployment rate, the economic growth rate,
the number of new businesses created [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ], and so
on. This stage involves determining the statistical
indicators to be monitored. Indicators are chosen to
measure phenomena or evaluate the performance of
an action.
• Identifying and validating the producers
of indicator values : this involves identifying
the data sources and the producers responsible for
collecting the indicator values. Data producers
include the United Nations (UN), the World Bank or
the World Health Organization (WHO), etc.
• Identification and validation of
disaggregation levels for each indicator:
this involves determining at what level of detail
data will be collected and reported (e.g., by region,
by gender, by age group, etc).
• Database design : this involves creating a
structure for storing indicators and their values in an
organized and eficient way.
• Implementation of the web application
for data collection : this involves developing
a web application enabling data producers to submit
their data systematically and securely. Tools can be
developed to facilitate this stage. The World Bank
provides the Survey Solution, a tool for the
producing data collection forms. However, Survey Solution
requires the installation of a server or the use of the
World Bank’s demo server and is limited to mobile
terminals [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ].
• Data production and validation : this
involves validating and integrating data into the
database. To ensure data reliability, several levels
of validation are often implemented. In our case,
three levels of data validation were necessary before
publication.
          </p>
          <p>For example, modeling an observation on the
unemployment rate for women in rural areas for the year 2018 in a
database can be done through the following parameters :
• Indicator : unemployment rate for women in
rural areas
• Disaggregation levels :
– Commune : BAN (Banikoara)
– Department : ALI (Alibori)
• Observed value : 2.6% (percentage)
• Period : 2018
• Producer : the organization or entity responsible
for collecting and publishing these statistical data.</p>
          <p>However, when it comes to integrating data with any type
of indicator and several levels of variable disaggregation, the
task quickly becomes arduous because it sometimes requires
rebuilding the database. Hence, the need for a standard that
takes these complexities into account.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Modeling with SDMX</title>
          <p>SDMX is an essential standard for simplifying statistical
data modeling and supporting diferent types of database,
whatever their level of complexity. It uses a
multidimensional approach based on the data cube model. The data
cube model, also known as the Online Analytical Processing
(OLAP) model, is a data modeling method designed to make
data analysis and visualization more accessible by
presenting it in multidimensional form. Data is organized in cubes,
with each dimension representing a distinct aspect of the
data.</p>
          <p>
            A multidimensional data cube can be thought of as a
model focused on the simultaneous measurement,
identiifcation and description of multiple instances of an entity
type. A multidimensional dataset consists of several
measurement records (observed values) organized along a group
of dimensions (e.g., “period,” “location,” “gender,” and “age
group”) [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. Using this type of representation, it is possible
to identify each individual data point according to its
“position” on a coordinate system defined by a common set of
dimensions. In addition to measurements and dimensions,
the data cube model can also incorporate metadata at the
level of the individual data point in the form of attributes.
Attributes provide the information needed to correctly
interpret individual observations (e.g., an attribute may specify
“percentage” as the unit of measurement).
          </p>
          <p>
            The multidimensional data cube model can support data
interoperability across many diferent systems,
independent of their technology platform and internal architecture.
What’s more, the content of a multidimensional data cube
model need not be limited to small datasets. In fact, “with
the advent of cloud and big data technologies, data cube
infrastructures have become efective instruments for
managing earth observation resources and services” [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ].
          </p>
          <p>Taking again, the example on the unemployment rate
for women in rural areas for the year 2018 its data cube
representation can be identified by the following dimensions
(figure 2):
• Period = 2018
• Sex = female
• Geographical distribution = rural area
• Observed value = 2.6</p>
          <p>In this way, SDMX enables data to be clearly represented
by associating measures with dimensions and attributes.
SDMX data structures, known as Data Structure
Definitions (DSD), describe how data is organized, identifying
key dimensions, measures and associated attributes. It also
provides standardized terminology for naming commonly
used dimensions and attributes, as well as code lists for
populating some of these dimensions and attributes. More
specifically, a DSD in SDMX describes the structure of a
dataset by assigning descriptor concepts to statistical data
elements, which include :
• dimensions that form the unique identifier (key) of
individual observations ;
• measurement(s) conventionally associated with the
concept of “observation value” (OBS_VALUE); and
• attributes that provide more information about a
part of the dataset.</p>
          <p>
            In addition, SDMX ofers a set of globally agreed DSDs
for diferent application domains, ensuring consistency and
interoperability between statistical organizations. [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ].
          </p>
          <p>This eliminates the need for a multitude of statistical
databases. A single database is enough to federate all an
organization’s data. The various players can then define
the indicators and levels of disaggregation which will be
encoded in the database, ready to receive any type of statistical
data.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Modeling case with SDMX</title>
          <p>
            Taking the example of the indicator “Total number of
support requests for multiple births met” [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ], the lack of an
efective modeling framework forced the designer to define
age ranges as variables, making it dificult to render the data.
Using SDMX, the following dimension/attribute levels can
be defined :
• SEX with the following code list : F, H, TOTALF,
          </p>
          <p>TOTALH
• TRANCHE_D_AGE with the following code list :
0-17</p>
          <p>ANS, 18-34-ANS, 35-59-ANS, 60-ANS-PLUS
• TYPE_HANDICAP with the following code list : HMI,</p>
          <p>
            HMS, HA, HV, HM, AFH
• DEP with the following code list : ALI, ATA, etc
• COM with the following code list : BAN, NATI, etc
The data representation presented in [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] would amount
to this simplified representation (Table 1 ).
          </p>
          <p>This has the advantage of a more simplified
representation and saves storage space eliminating redundancy. The
ability to create data structures to define data representation
ofers great flexibility to data producers, who could define
context-dependent data structures for the same indicator.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>SDMX provides the tools and standards needed to structure
open data in a way that maximizes its usefulness and impact.</p>
      <p>Its use has enabled us to meet a number of challenges :
• eliminate the multiplicity of databases used to collect
and process statistical indicators ;
• put an end to the disparity and scattering of
monitoring-evaluation data ;
• an integrated database for storing indicators ;
• efectively operationalize the statistics development
strategy ;
• establish a coherent and scalable governance system
for statistical data ;
• standardized data ;
• SDMX interoperable APIs, which focus on
retrieving metadata and data in XML-JSON-CSV formats.
They can used as intermediaries between
SDMXstandardized systems or platforms. These platforms
include : the .Stat Suite ;
• an environment for producing and disseminating
statistical data.</p>
      <p>This standardized data is available on a dedicated platform.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Data governance, data interoperability and data security are
interdependent and essential elements in maximizing the
value and minimizing the risks associated with the growing
use of data in our society. Implementing the SDMX standard
has enabled us to standardize data and obtain
interoperable SDMX APIs enabling statistical data to be exchanged
between diferent systems or platforms. The result is an
information system based on this standard. There is no longer
any need for a multitude of statistical databases ; a single
one is suficient to federate all an organization’s data. The
various players can then define the indicators and levels of
disaggregation which will be encoded in the database, ready
to receive any type of statistical data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Curien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Muet</surname>
          </string-name>
          , E. Cohen,
          <string-name>
            <given-names>M.</given-names>
            <surname>Didier</surname>
          </string-name>
          , G. Bordes, La société de l'information, La Documentation française,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Otto</surname>
          </string-name>
          ,
          <article-title>Organizing data governance: Findings from the telecommunications industry and consequences for large service providers</article-title>
          ,
          <source>Communications of the Association for Information Systems</source>
          <volume>29</volume>
          (
          <year>2011</year>
          )
          <article-title>3</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <article-title>Learning analytics interoperability-the big picture in brief, Learning Analytics Community Exchange (</article-title>
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L. G.</given-names>
            <surname>González Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Orrell</surname>
          </string-name>
          ,
          <article-title>Data interoperability: A practitioner's guide to joining up data in the development sector</article-title>
          . (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>M. TRAORE</surname>
          </string-name>
          , Les banques de données environnementales (????).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Bank</surname>
          </string-name>
          , World bank open data,
          <year>2024</year>
          . URL: https: //data.worldbank.org/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Food</surname>
            ,
            <given-names>A. O.</given-names>
          </string-name>
          <article-title>of the United Nations</article-title>
          , Statistiques,
          <year>2024</year>
          . URL: https://www.fao.org/statistics/fr.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[8] openAFRICA, Africa's largest volunteer driven open data platform</article-title>
          ,
          <year>2024</year>
          . URL: https://open.africa/.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Bénin</surname>
          </string-name>
          ,
          <article-title>Un coup d'oeil sur les données du benin</article-title>
          ,
          <year>2024</year>
          . URL: https://benin.opendataforafrica.org/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Soni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ranga</surname>
          </string-name>
          ,
          <article-title>Api features individualizing of web services: Rest and soap</article-title>
          ,
          <source>International Journal of Innovative Technology and Exploring Engineering</source>
          <volume>8</volume>
          (
          <year>2019</year>
          )
          <fpage>664</fpage>
          -
          <lpage>671</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Timsina</surname>
          </string-name>
          , L. Decker,
          <article-title>Securing the next generation of digital infrastructure: The importance of protecting modern apis (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Asemi</surname>
          </string-name>
          ,
          <article-title>A study on api security pentesting (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Zakhidov</surname>
          </string-name>
          ,
          <article-title>Economic indicators: tools for analyzing market trends and predicting future performance</article-title>
          ,
          <source>International Multidisciplinary Journal of Universal Scientific Prospectives</source>
          <volume>2</volume>
          (
          <year>2024</year>
          )
          <fpage>23</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L. J.</given-names>
            <surname>Young</surname>
          </string-name>
          , G. Carletto,
          <string-name>
            <given-names>G.</given-names>
            <surname>Márquez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Rozkrut</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Stefanou,</surname>
          </string-name>
          <article-title>The production of oficial agricultural statistics in 2040: What does the future hold?</article-title>
          ,
          <source>Statistical Journal of the IAOS</source>
          <volume>40</volume>
          (
          <year>2024</year>
          )
          <fpage>203</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Nativi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mazzetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Craglia</surname>
          </string-name>
          ,
          <article-title>A view-based model of data-cube to support big earth data systems interoperability</article-title>
          ,
          <source>Big Earth Data</source>
          <volume>1</volume>
          (
          <year>2017</year>
          )
          <fpage>75</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>SiDoFFe-NG</surname>
          </string-name>
          ,
          <article-title>Statistiques detaillees du domaine protection sociale et solidarite nationale</article-title>
          ,
          <year>2024</year>
          . URL: https://2019a2024.sidofe-ng.social.gouv.bj/ sidofepublic/stats/details/pssn.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>