Statistical data governance based on the SDMX
                         Haïrou-Dine BIAO K.1,*,† , Emery ASSOGBA1,†
                         1
                             Department of Computer Engineering and Telecommunications, EPAC / University of Abomey-Calavi, Abomey-Calavi, Benin


                                           Abstract
                                           Statistics are essential for development of a nation. With arise of technologies such as AI and big data efficient data governance become
                                           more and more important to overcome challenges and opportunities evolved by them. Unfortunately, most of databases in our public
                                           and private companies and organizations lack interoperability. This work proposes a statistical data governance mechanism based on
                                           the Statistical Data and Metadata eXchange (SDMX) standard, designed specifically for statistical data sharing and exchange between
                                           organizations.We designed and implemented a statistical database based on SDMX. This system allows more than 10 benin public
                                           organizations to be able to produce, publish and share statistical data from various theme. They can express the indicators and levels of
                                           disaggregation of these indicators in a flexible way, without having to create a new database.

                                           keywords
                                           statistical data, database, interoperability, SDMX


                         1. Introduction                                                                                              2.2. Interoperability
                         Today, digitization and increasing information exchange,                                                     IEEE defines interoperability as “the ability of two or more
                         statistics play an essential role in the development of na-                                                  systems or components to exchange information and to use
                         tions [1]. To be able to get insight from data, data need                                                    the information that has been exchanged” [3]. A specific
                         to be collected, validated, published and treated. That is                                                   challenge to interoperability arises from the fact that there is
                         made possible by building database and application over                                                      generally no single way of representing information. Thus,
                         these databases to access these data. Unfortunately, the                                                     the same information content is often represented in dif-
                         multiplicity of these databases does not allow for efficient                                                 ferent (usually incompatible) ways across different systems
                         data governance. Because it is more difficult to exchange                                                    and organizations [4]. Data interoperability therefore re-
                         and maintain data between different systems. This work                                                       quires not only the use of standards and metadata, but also
                         consists in setting up a statistical data governance frame-                                                  the provision of standardized datasets in formats that can
                         work based on the Statistical Data and Metadata eXchange                                                     be accessed by both humans and machines.
                         (SDMX) standard, thus enabling other platforms implement-                                                       International standards exist for this purpose :
                         ing this standard to easily consume the data produced by                                                          • Open data
                         this database, guaranteeing a high degree of interoperabil-
                                                                                                                                           • Statistical Data and Metadata eXchange (SDMX)
                         ity and reducing the number of databases needed to collect
                         statistical data.
                                                                                                                                      2.2.1. Open data
                                                                                                                                      Open data refers to data that is freely available to everyone
                         2. Background and state of the art                                                                           to use, modify and share without restrictions. For optimal
                         The rapid advent of information technology has led to a                                                      interoperability, data and metadata files must be published
                         massive explosion of data, creating unprecedented opportu-                                                   in such a way as to be editable by humans and usable by
                         nities, but also posing complex governance challenges.                                                       machines, while remaining independent of language, tech-
                                                                                                                                      nology and infrastructure. A first step is to make data avail-
                                                                                                                                      able via mass downloads in open data formats. There are
                         2.1. Data governance                                                                                         various fully documented and widely accepted schemas for
                         Data governance is defined as “an overall framework within                                                   constructing digital data files, such as CSV, JSON, XML,
                         the company for assigning rights and duties to decisions in                                                  and GeoJSON, among others [4]. In the context of open
                         order to manage data appropriately by as a corporate asset”                                                  data, several catalogs list portals publishing public data [5].
                         [2]. It is therefore a set of principles designed to manage the                                              Initiatives include :
                         entire data lifecycle, from acquisition to disposal, including
                                                                                                                                           • Transnational initiatives such as :
                         use.
                            Good data governance facilitates exchange and compati-                                                                – World Bank [6],which is one of the promoters
                         bility between different systems and organizations. It thus                                                                of data sources,
                         promotes greater interoperability.                                                                                       – the databases of the Food and Agriculture
                                                                                                                                                    Organization (FAO) [7], which cover a wide
                                                                                                                                                    range of topics related to food security and
                                                                                                                                                    agriculture. These include :
                                                                                                                                                        ∗ FAOSTAT, which provides free access
                         International Conference of Information and Communication Technologies                                                           to statistics on food and agriculture (in-
                         of ANSALB (CITA): Security issues in the age of AI, June 27-28, 2024,                                                            cluding crop and livestock sub-sectors,
                         Cotonou, BENIN                                                                                                                   etc.);
                         *
                           Corresponding author.
                         †                                                                                                                              ∗ AQUASTAT, which gives users access to
                           These authors contributed equally.
                         $ dineb90@gmail.com (H. B. K.); emery.assogba@uac.bj                                                                             the main database of country statistics,
                         (E. ASSOGBA)                                                                                                                     focusing on water resources, water use
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).
                                                                                                                                                          and agricultural water management.

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
     • Continental initiatives such as :                              • API-8 : Security Misconfiguration
           – openAFRICA, an Africa volunteer-driven                   • API-9 : Improper Inventory Management
             open data platform that aims to be the largest           • API-10 : Unsafe Consumption of APIs
             independent repository of open data on the
                                                                    To prevent such situations, developers need to focus on
             African continent [8].
                                                                 writing secure code and ensuring that APIs are configured
     • National initiatives such as :                            securely. To guarantee API security, it is essential to con-
           – Benin open data portal [9];                         sider three fundamental pillars of security : confidentiality,
                                                                 integrity, and availability [12].
2.2.2. Statistical Data and Metadata eXchange
       (SDMX)                                                    3. Material and methodology
SDMX is an international initiative aimed at standardizing
and modernizing statistical data and metadata exchange           The aim of this initiative is to ensure interoperability be-
mechanisms. This standard encompasses a data model               tween databases and facilitate the distribution, availability,
(the multidimensional data cube), standard vocabularies          use and reuse of information. The same applies to data
(content-oriented guidelines), a formal schema definition,       security.
and various data serialization formats for building data files
and electronic messages for data exchange. In the SDMX           3.1. Material
ecosystem, data providers can choose between different data
                                                                 A set of technological tools including a development envi-
serialization formats for sharing datasets, including XML,
                                                                 ronment, programming languages and software tools was
CSV, JSON, or even EDIFACT [4].
                                                                 used.
   These standards are implemented through new technolo-
gies such as application programming interfaces (APIs).
APIs facilitate interaction between two different applica-       3.2. Methodology
tions so that they can communicate with each other. They
                                                                 There are several stages to the process.
act as intermediaries. APIs use the Hypertext Transfer Proto-
col (HTTP) for cooperation between different programs and
web services (REST or SOAP) [10]. They are reusable pieces       3.2.1. Description of the design of conventional
of software that enable several applications to interact with           statistical database systems
an information system. They offer machine-to-machine ac-         Setting up a statistical database system involves a series of
cess to data services and provide a standardized means of        steps.
managing security and errors.                                       The diagram below summarizes the process :
   APIs are therefore catalysts for interoperability. How-
ever, as their use increases, data security becomes a major
concern.

2.3. Data security
Protecting sensitive data is an important part of data gov-
ernance. It involves implementing measures and protocols
to prevent unauthorized access, leakage or manipulation of
confidential information.
   However, despite efforts to ensure data security, informa-
tion leaks are still a reality. Due to the rise in API-related
vulnerabilities, the Open Web Application Security Project
(OWASP), a foundation dedicated to improving software
security, has been issuing its list of the top 10 web security
vulnerabilities every 2-3 years since 2003. The OWASP foun-
dation’s separate classification of the top 10 vulnerabilities
for web applications and APIs highlights the divergence
between modern APIs and traditional web applications, re-
quiring a tailored security approach.
   The OWASP foundation provides a list of the top 10
OWASP API vulnerabilities in 2023 [11] :

     • API-1 : Broken Object Level Authorization (BOLA)
     • API-2 : Broken Authentication
     • API-3 : Broken Object Property Level Authorization
     • API-4 : Unrestricted Resource Consumption
     • API-5 : Broken Function Level Authorization               Figure 1: Illustration of the process of setting up a statistical
       (BFLA)                                                    database system.
     • API-6 : Unrestricted Access to Sensitive Business
       Flows                                                     These are mainly :
     • API-7 : Server Side Request Forgery
     • Identifying and validating indicators :
       indicators are quantitative or qualitative measures
       used to assess the performance or state of a spe-
       cific domain. They can include statistics such as
       the unemployment rate, the economic growth rate,
       the number of new businesses created [13], and so
       on. This stage involves determining the statistical
       indicators to be monitored. Indicators are chosen to
       measure phenomena or evaluate the performance of
       an action.
     • Identifying and validating the producers
       of indicator values : this involves identifying
       the data sources and the producers responsible for
       collecting the indicator values. Data producers
       include the United Nations (UN), the World Bank or
       the World Health Organization (WHO), etc.                  Figure 2: Example of observation representation using the data
     • Identification and validation of                           cube model.
       disaggregation levels for each indicator:
       this involves determining at what level of detail
       data will be collected and reported (e.g., by region,      whatever their level of complexity. It uses a multidimen-
       by gender, by age group, etc).                             sional approach based on the data cube model. The data
     • Database design : this involves creating a struc-          cube model, also known as the Online Analytical Processing
       ture for storing indicators and their values in an         (OLAP) model, is a data modeling method designed to make
       organized and efficient way.                               data analysis and visualization more accessible by present-
     • Implementation of the web application                      ing it in multidimensional form. Data is organized in cubes,
       for data collection : this involves developing             with each dimension representing a distinct aspect of the
       a web application enabling data producers to submit        data.
       their data systematically and securely. Tools can be          A multidimensional data cube can be thought of as a
       developed to facilitate this stage. The World Bank         model focused on the simultaneous measurement, identi-
       provides the Survey Solution, a tool for the produc-       fication and description of multiple instances of an entity
       ing data collection forms. However, Survey Solution        type. A multidimensional dataset consists of several mea-
       requires the installation of a server or the use of the    surement records (observed values) organized along a group
       World Bank’s demo server and is limited to mobile          of dimensions (e.g., “period,” “location,” “gender,” and “age
       terminals [14].                                            group”) [4]. Using this type of representation, it is possible
     • Data production and validation : this                      to identify each individual data point according to its “posi-
       involves validating and integrating data into the          tion” on a coordinate system defined by a common set of
       database. To ensure data reliability, several levels       dimensions. In addition to measurements and dimensions,
       of validation are often implemented. In our case,          the data cube model can also incorporate metadata at the
       three levels of data validation were necessary before      level of the individual data point in the form of attributes.
       publication.                                               Attributes provide the information needed to correctly inter-
                                                                  pret individual observations (e.g., an attribute may specify
  For example, modeling an observation on the unemploy-           “percentage” as the unit of measurement).
ment rate for women in rural areas for the year 2018 in a            The multidimensional data cube model can support data
database can be done through the following parameters :           interoperability across many different systems, indepen-
     • Indicator : unemployment rate for women in ru-             dent of their technology platform and internal architecture.
       ral areas                                                  What’s more, the content of a multidimensional data cube
     • Disaggregation levels :                                    model need not be limited to small datasets. In fact, “with
                                                                  the advent of cloud and big data technologies, data cube
            – Commune : BAN (Banikoara)                           infrastructures have become effective instruments for man-
            – Department : ALI (Alibori)                          aging earth observation resources and services” [15].
     • Observed value : 2.6% (percentage)                            Taking again, the example on the unemployment rate
     • Period : 2018                                              for women in rural areas for the year 2018 its data cube
     • Producer : the organization or entity responsible          representation can be identified by the following dimensions
       for collecting and publishing these statistical data.      (figure 2):

   However, when it comes to integrating data with any type            • Period = 2018
of indicator and several levels of variable disaggregation, the        • Sex = female
task quickly becomes arduous because it sometimes requires             • Geographical distribution = rural area
rebuilding the database. Hence, the need for a standard that           • Observed value = 2.6
takes these complexities into account.
                                                                     In this way, SDMX enables data to be clearly represented
                                                                  by associating measures with dimensions and attributes.
3.2.2. Modeling with SDMX
                                                                  SDMX data structures, known as Data Structure Defini-
SDMX is an essential standard for simplifying statistical         tions (DSD), describe how data is organized, identifying
data modeling and supporting different types of database,         key dimensions, measures and associated attributes. It also
     Table 1
     Use of SDMX for improved data representation of an existing platform

               Indicator    DEP    COM         SEX        TRANCHE_D_AGE          TYPE_HANDICAP         Period   Value
                   1        ALI     BAN      TOTAL_F               _T                    _T             2024    1


provides standardized terminology for naming commonly                        • put an end to the disparity and scattering of
used dimensions and attributes, as well as code lists for                      monitoring-evaluation data ;
populating some of these dimensions and attributes. More                     • an integrated database for storing indicators ;
specifically, a DSD in SDMX describes the structure of a                     • effectively operationalize the statistics development
dataset by assigning descriptor concepts to statistical data                   strategy ;
elements, which include :                                                    • establish a coherent and scalable governance system
                                                                               for statistical data ;
     • dimensions that form the unique identifier (key) of
                                                                             • standardized data ;
       individual observations ;
                                                                             • SDMX interoperable APIs, which focus on retriev-
     • measurement(s) conventionally associated with the
                                                                               ing metadata and data in XML-JSON-CSV formats.
       concept of “observation value” (OBS_VALUE); and
                                                                               They can used as intermediaries between SDMX-
     • attributes that provide more information about a
                                                                               standardized systems or platforms. These platforms
       part of the dataset.
                                                                               include : the .Stat Suite ;
   In addition, SDMX offers a set of globally agreed DSDs                    • an environment for producing and disseminating
for different application domains, ensuring consistency and                    statistical data.
interoperability between statistical organizations. [4].                This standardized data is available on a dedicated platform.
   This eliminates the need for a multitude of statistical
databases. A single database is enough to federate all an
organization’s data. The various players can then define                5. Conclusion
the indicators and levels of disaggregation which will be en-
coded in the database, ready to receive any type of statistical         Data governance, data interoperability and data security are
data.                                                                   interdependent and essential elements in maximizing the
                                                                        value and minimizing the risks associated with the growing
3.2.3. Modeling case with SDMX                                          use of data in our society. Implementing the SDMX standard
                                                                        has enabled us to standardize data and obtain interopera-
Taking the example of the indicator “Total number of sup-               ble SDMX APIs enabling statistical data to be exchanged
port requests for multiple births met” [16], the lack of an             between different systems or platforms. The result is an in-
effective modeling framework forced the designer to define              formation system based on this standard. There is no longer
age ranges as variables, making it difficult to render the data.        any need for a multitude of statistical databases ; a single
Using SDMX, the following dimension/attribute levels can                one is sufficient to federate all an organization’s data. The
be defined :                                                            various players can then define the indicators and levels of
                                                                        disaggregation which will be encoded in the database, ready
     • SEX with the following code list : F, H, TOTALF,                 to receive any type of statistical data.
       TOTALH
     • TRANCHE_D_AGE with the following code list : 0-17-
       ANS, 18-34-ANS, 35-59-ANS, 60-ANS-PLUS                           References
     • TYPE_HANDICAP with the following code list : HMI,
       HMS, HA, HV, HM, AFH                                              [1] N. Curien, P.-A. Muet, E. Cohen, M. Didier, G. Bor-
                                                                             des, La société de l’information, La Documentation
     • DEP with the following code list : ALI, ATA, etc
                                                                             française, 2004.
     • COM with the following code list : BAN, NATI, etc
                                                                         [2] B. Otto, Organizing data governance: Findings from
                                                                             the telecommunications industry and consequences
   The data representation presented in [16] would amount
                                                                             for large service providers, Communications of the
to this simplified representation (Table 1 ).
                                                                             Association for Information Systems 29 (2011) 3.
   This has the advantage of a more simplified representa-
                                                                         [3] A. Cooper, Learning analytics interoperability-the
tion and saves storage space eliminating redundancy. The
                                                                             big picture in brief, Learning Analytics Community
ability to create data structures to define data representation
                                                                             Exchange (2014).
offers great flexibility to data producers, who could define
                                                                         [4] L. G. González Morales, T. Orrell, Data interoperabil-
context-dependent data structures for the same indicator.
                                                                             ity: A practitioner’s guide to joining up data in the
                                                                             development sector. (2018).
4. Results                                                               [5] M. TRAORE, Les banques de données environnemen-
                                                                             tales (????).
SDMX provides the tools and standards needed to structure                [6] W. Bank, World bank open data, 2024. URL: https:
open data in a way that maximizes its usefulness and impact.                 //data.worldbank.org/.
  Its use has enabled us to meet a number of challenges :                [7] Food, A. O. of the United Nations, Statistiques, 2024.
                                                                             URL: https://www.fao.org/statistics/fr.
     • eliminate the multiplicity of databases used to collect           [8] openAFRICA, Africa’s largest volunteer driven open
       and process statistical indicators ;                                  data platform, 2024. URL: https://open.africa/.
 [9] Bénin, Un coup d’oeil sur les données du benin, 2024.
     URL: https://benin.opendataforafrica.org/.
[10] A. Soni, V. Ranga, Api features individualizing of
     web services: Rest and soap, International Journal of
     Innovative Technology and Exploring Engineering 8
     (2019) 664–671.
[11] D. Timsina, L. Decker, Securing the next generation
     of digital infrastructure: The importance of protecting
     modern apis (2023).
[12] H. Asemi, A study on api security pentesting (2023).
[13] G. Zakhidov, Economic indicators: tools for analyz-
     ing market trends and predicting future performance,
     International Multidisciplinary Journal of Universal
     Scientific Prospectives 2 (2024) 23–29.
[14] L. J. Young, G. Carletto, G. Márquez, D. A. Rozkrut,
     S. Stefanou, The production of official agricultural
     statistics in 2040: What does the future hold?, Statisti-
     cal Journal of the IAOS 40 (2024) 203–210.
[15] S. Nativi, P. Mazzetti, M. Craglia, A view-based model
     of data-cube to support big earth data systems inter-
     operability, Big Earth Data 1 (2017) 75–99.
[16] SiDoFFe-NG, Statistiques detaillees du domaine
     protection sociale et solidarite nationale, 2024.
     URL:       https://2019a2024.sidoffe-ng.social.gouv.bj/
     sidoffepublic/stats/details/pssn.