<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Quality using Machine Learning: The Case of Satu Data Indonesia</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dwi Puspita Sari</string-name>
          <email>dsari@albany.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dimaz Cahya Ardhi</string-name>
          <email>dardhi@albany.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mayra E. Santiago</string-name>
          <email>msantiago1@albany.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bagus Jatmiko</string-name>
          <email>bagus.jatmiko.id@nps.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Emergency Preparedness, Homeland Security and Cybersecurity, University at Albany, State University New</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>EGOV-CeDEM-ePart conference</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Information Sciences Department, Naval Postgraduate School</institution>
          ,
          <addr-line>1 University Circle, Monterey, CA 93943</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>York</institution>
          ,
          <addr-line>1400 Washington Ave, Albany, NY 12222</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Sharing government data with the public is a government initiative to increase accountability, transparency, and citizen participation. Through open government data (OGD) portal and for efective usage, the government needs to provide a good quality OGD, which is characterized by its accuracy, relevance, completeness, availability, and timeliness. This study aims to assess the quality of OGD published through the Indonesian national OGD portal named Satu Data Indonesia (SDI) Portal using a machine learning (ML) approach. The study begins by observing the OGD published on its portal and collecting the metadata that describes its published data. After collecting metadata, we test its quality using a ML approach. This study utilizes Orange, an open-source machine learning toolkit, to provide ML predictions with a scoring system.</p>
      </abstract>
      <kwd-group>
        <kwd>open government data</kwd>
        <kwd>OGD</kwd>
        <kwd>Satu Data Indonesia</kwd>
        <kwd>machine learning</kwd>
        <kwd>data quality</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The use of the OGD portal is one of the implementations of digital transformation and open
government initiatives to provide data and information using technology. In 2019, the President
of the Republic of Indonesia confirmed Regulation Number 39 on ”Satu Data Indonesia”, which
acts as the legal umbrella for the Indonesian government’s open data portal. SDI portal, as the
oficial policy data management portal, aims to create good quality data that is easy to access and
can be shared within government agencies, the public sector, non-government organizations,
and the private sector. To benefit from the OGD, government agencies, as OGD providers, need
to ensure that their data published to the public is of good quality. The good quality of the
data needs to fulfill the criteria, including accuracy, relevance, completeness, availability, and
timeliness. Additionally, government agencies must consider the data and open formats that
can limit public access [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The research question that guides our study is What is the quality
of OGD published through the Satu Data Indonesia portal?
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Methodology</title>
      <p>
        The study focuses on a single case named SDI portal (https://data.go.id/), the oficial Indonesia
OGD portal established in 2022. This case was selected to represent a developing country that
started to develop a portal to disseminate government data, which includes diferent levels
of agencies. SDI published 395,228 datasets by April 2024. First, we observed the SDI portal
and collected the metadata of OGD published through SDI. The metadata for OGD describes
dataset sources and details of dataset information [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Our preliminary study collected
1,000 dataset metadata from SDI through a random selection process. Second, we examined the
dataset quality using an ML approach and supervised machine learning for binary classification.
We used Orange software widget test, score, and predictions to evaluate our model.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Expected Findings and Future Work</title>
      <p>In this study, we expect to build data training and test our model with diferent algorithms. We
expect to compare those models, which can perform better in predicting OGD quality. The
performance score should include the area value under the receiver operating characteristic
curve, classification accuracy, F1, precision, recall, and correlation coeficient. The result of
this study contributes insight into the quality of OGD published through the SDI portal for the
Indonesian government to understand their data quality and keep improving their data quality
to increase public accountability and participation. Furthermore, it will also raise awareness and
engagement for the government and citizens on implementing and utilizing OGD to improve
citizens’ living standards by enhancing and implementing good governance through good
quality OGD.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Belhiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bounab</surname>
          </string-name>
          ,
          <article-title>Pemerintah luncurkan portal satu data indonesia</article-title>
          , in: MIT International Conference on Information Quality,
          <source>UA Little Rock</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zuiderwijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Janssen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Susha</surname>
          </string-name>
          ,
          <article-title>Improving the speed and ease of open data use through metadata, interaction mechanisms, and quality indicators</article-title>
          ,
          <source>Journal of Organizational Computing and Electronic Commerce</source>
          <volume>26</volume>
          (
          <year>2022</year>
          ). doi:https://doi.org/10.1080/10919392.
          <year>2015</year>
          .
          <volume>1125180</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <article-title>Usage by stakeholders” as the objective of “transparency-by-design” in open government data</article-title>
          ,
          <source>Information and Learning Science</source>
          <volume>118</volume>
          (
          <year>2017</year>
          ). doi:https://doi.org/10. 1108/ILS- 05- 2017- 0034.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaasenbrood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zuiderwijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Janssen</surname>
          </string-name>
          , M. de Jong, N. Bharosa,
          <article-title>Exploring the factors influencing the adoption of open government data by private organisations</article-title>
          ,
          <source>International Journal of Public Administration in the Digital Age</source>
          <volume>2</volume>
          (
          <year>2015</year>
          ). doi:https://10.4018/ijpada. 2015040105.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>