<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>V. A. Vysotska);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Ukrainian Big Data: The Problem Of Databases Localization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Victoria Vysotska</string-name>
          <email>victoria.a.vysotska@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ihor Shubin</string-name>
          <email>igor.shubin@nure.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksym Mezentsev</string-name>
          <email>maksym.mezentsev@nure.ua</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karen Kobernyk</string-name>
          <email>karen.kobernyk@nure.ua</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grygoryy Chetverikov</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radioelectronics</institution>
          ,
          <addr-line>Nauky ave. 14, Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>Stepan Bandera Street, 12, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>•
•
•
•
•
•
•
•
•
 ⁄(
( +  )
2
× 0,01) = 
(1)</p>
      <p>To demonstrate the capabilities of the prototype, a dataset with data in the Russian language
was selected. This choice is not accidental, as it allows you to test the system in conditions that
are as close as possible to the real needs of the Ukrainian market. Translation from Russian into
Ukrainian requires not only high accuracy and preservation of context, but also a deep
understanding of cultural and linguistic nuances, which poses complex tasks to the system, the
solution of which requires the application of the latest achievements in the field of machine
learning and natural language processing in different translation services used in research.</p>
      <p>The dataset that was uploaded into database contains around 140 000 rows with
dependencies and 3 tables for short texts. Localized example can be seen in Table 1. And about
70 000 rows for long paragraphs in single table. Localized example can be seen in Table 2.</p>
      <p>That kind of dataset was used to test English translation accuracy and can be found in open
source by the name ”RuBQ” now we will use it to test localization from russian to Ukrainian.</p>
      <p>The “Uid” column is used to represent unique id that is assigned to a specific column and can
be used to connect relative collumns from different tables. In our case we made a relation
between “Question” table and “Answer” table.</p>
      <p>The “Question” column is representing text field that contains questions for a different topics.
Such type of data has to be translated correctly so it will be understood after translation – that
factor can be an indicator for logical mistakes made by translation services.</p>
      <p>The “Answer” column is basically the same as “Question” by type. It contains answers for
questions stored in previous column. It also has to be properly translated because if it is not – the
whole row in database becomes invalid and unusable. Visualization of such database relations
with are shown on Figure 2.</p>
      <p>The “Paragraph” column is representing some historical data that is stored under specific id.
The size of the paragraph can vary and they can be much bigger, up to 1000 symbols.</p>
      <p>Visualized database is MySQL instance that is used during tests and contains three databases.
Databases has many-to-many relations using separate table to store Ids in between the
connection.</p>
    </sec>
    <sec id="sec-2">
      <title>4. Experiment</title>
      <p>The series of experiments were conducted and aimed at understanding the best database
management system for accommodating Ukrainian language-specific data.</p>
      <p>The experiment will involve acquiring a dataset containing JSON data representative of
language content needing translation. This dataset was imported into MySQL database, to
measure time needed to parse large quantities of data in combination with time spent on
translation and data transfer to translation services.</p>
      <p>
        Key metrics for evaluation will include percentage of translation mistakes, single row
translation time, average time needed to perform full translation and data-storing cycle. Through
comprehensive experimentation and analysis, we aim to identify the architecture that best
addresses the bigdata localization challenges in Ukrainian big data[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>By understanding the strengths and limitations of different database solutions, database
localization strategies can be addressed and contributed to the development of more efficient and
effective data management practices tailored to Ukrainian language requirements.
Steps that were performed during experiment:
1. JSON data from open source
2. Loaded data into MySQL cluster
3. Filtered text from the data
4. Sent the chosen text to translator
5. Got data back from translator and saved it into new instance
Visualized version of the steps did during the experiment is shown on the figure 3.</p>
      <p>While choosing a database for the experiment, we should look at various parameters to find
the one we need. Figure 4 shows the comparison of the most popular SQL databases. The choice
of MySQL determined for several reasons:
• MySQL is the most popular database that is widely used it the software engineering
world.
• MySQL is one of the best choices for performance and stability. It can handle large
amounts of data of any kind.
• provides quick access to the data
• Supports different types of queries, flexible data control
• Has convenient data management</p>
      <p>Overall, it is the best choice for our goals, since we have to load a large amount of data and
mySQL allows us to have easy control over the data and conduct any manipulations we need.</p>
    </sec>
    <sec id="sec-3">
      <title>5. Results</title>
    </sec>
    <sec id="sec-4">
      <title>5.1. Google translate</title>
      <p>Evaluated results represent 2 different types of data tested and 6 different services used in
MySQL database. Results were split by subchapters for every service used in experiment with 2
tables containing short data and long data.</p>
    </sec>
    <sec id="sec-5">
      <title>5.2. Meta translator</title>
    </sec>
    <sec id="sec-6">
      <title>5.3. Reverso</title>
    </sec>
    <sec id="sec-7">
      <title>5.4. Onlinetranslator.eu</title>
      <p>5.5. DeepL</p>
    </sec>
    <sec id="sec-8">
      <title>5.6. Translate.ua</title>
    </sec>
    <sec id="sec-9">
      <title>6. Discussion</title>
      <p>Based on achieved results, each translator has it’s own benefits and choice of used service
depends on the goal we want to achieve. If the crucial part is API call speed, the suitable service
should have good servers with low response time and high symbol restriction as for “Google
Translate” or “Meta”. On the other hand, the best choice for translation accuracy is local
translator, since they are more adapted to the language environment and can show low
mistakes percentage even with small amount of context for translated texts.</p>
      <p>As a matter of fact the decision of Database and it’s workload has minimal impact on
average row process time as we used locally hosted solution, but if database location, server
load and internet connection is applied – time can be increaced significantly.</p>
    </sec>
    <sec id="sec-10">
      <title>7. Conclusion</title>
      <p>Upon discussing the Big Data field in Ukraine and the difficulties associated with localizing
databases, several important topics come to light.</p>
      <p>First off, there are special reasoning localizing databases in Ukraine. By respecting national
interests and guaranteeing that sensitive information stays under national control, localizing
data can improve data sovereignty, security, and regulatory compliance. However, putting into
practice successful database localization techniques - require a strong technological foundation,
technological investment, and adherence to global data standards.</p>
      <p>Secondly, there is special connection between the localization of databases in Ukraine and
more general problems with data privacy and data availability.</p>
      <p>Thirdly, Adherence to localization specifications could result in extra expenses and
regulatory complications, which could affect corporate operations and innovation. Furthermore,
data localization strategies may obstruct international collaboration and cross-border data
flows, which would reduce prospects for global integration and economic growth.</p>
      <p>In summary, the resolution of database localization issues in Ukraine necessitates a
sophisticated strategy that strikes a compromise between national and international concerns.
Ukraine can successfully traverse the challenges of data localization and realize the full
potential of its big data ecosystem for the good of society at large by adhering to the values of
awailability, accountability, and inclusion.</p>
    </sec>
    <sec id="sec-11">
      <title>8. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Orlovskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Orekhov</surname>
          </string-name>
          ,
          <article-title>An Approach and Software Prototype for Translation of Natural Language Business Rules into Database Structure</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          <volume>2870</volume>
          (
          <year>2021</year>
          )
          <fpage>1274</fpage>
          -
          <lpage>1291</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Garcarz</surname>
          </string-name>
          , Legal Language Translation:
          <article-title>Theory behind the Practice</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          , Vol-
          <volume>3171</volume>
          (
          <year>2022</year>
          )
          <fpage>2</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Shubin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kozyriev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Liashik</surname>
          </string-name>
          , G. Chetverykov,
          <article-title>Methods of adaptive knowledge testing based on the theory of logical networks</article-title>
          ,
          <source>in: Proceedings of the 5th International Conference on Computational Linguistics and Intelligent Systems, COLINS</source>
          <year>2021</year>
          , Lviv, Ukraine,
          <year>2021</year>
          , pp.
          <fpage>1184</fpage>
          -
          <lpage>1193</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Stirewalt</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>ORM ontologies with executable derivation rules to support semantic search in large-scale data applications</article-title>
          ,
          <source>Proceedings - ACM/IEEE 25th International Conference on Model Driven Engineering Languages and Systems, MODELS 2022: Companion Proceedings</source>
          , p.
          <fpage>81</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Orlovskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Orekhov</surname>
          </string-name>
          ,
          <article-title>An Approach and Software Prototype for Translation of Natural Language Business Rules into Database Structure</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          <volume>2870</volume>
          (
          <year>2021</year>
          )
          <fpage>1274</fpage>
          -
          <lpage>1291</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Kulik</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chukhray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Havrylenko</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Information Technology for Creating Intelligent Computer Programs for Training in Algorithmic Tasks</article-title>
          .
          <source>Part</source>
          <volume>1</volume>
          :
          <string-name>
            <given-names>Mathematical</given-names>
            <surname>Foundations</surname>
          </string-name>
          .
          <source>System Research and Information Technologies</source>
          ,
          <year>2022</year>
          (4),
          <fpage>27</fpage>
          -
          <lpage>41</lpage>
          . doi:
          <volume>10</volume>
          .20535/SRIT.2308-
          <fpage>8893</fpage>
          .
          <year>2021</year>
          .
          <volume>4</volume>
          .
          <fpage>02</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Falatiuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shirokopetleva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dudar</surname>
          </string-name>
          ,
          <article-title>Investigation of architecture and technology stack for e-archive system</article-title>
          ,
          <source>in: 2019 IEEE International Scientific-Practical Conference: Problems of Infocommunications Science and Technology, PIC S and T 2019 - Proceedings</source>
          , p.
          <fpage>229</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Herud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Baumeister</surname>
          </string-name>
          ,
          <article-title>Testing Product Configuration Knowledge Bases Declaratively</article-title>
          , in: LWDA 2022 - Workshops: Special Interest Group on
          <article-title>Knowledge Management (FGWM), Knowledge Discovery, Data Mining, and Machine Learning (FGKD) and Special Interest Group on Database Systems (FGDB)</article-title>
          ,
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>3341</volume>
          , pp.
          <fpage>173</fpage>
          -
          <lpage>186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Igor</given-names>
            <surname>Shubin</surname>
          </string-name>
          , Andrii Kozyriev,
          <article-title>Method for Solving Quantifier Linear Equations for Formation of Optimal Queries to Databases in: Computational Linguistics and Intelligent Systems 2023</article-title>
          ,
          <source>Proceedings of the 7th International Conference on Computational Linguistics and Intelligent Systems</source>
          . vol.
          <volume>449</volume>
          -
          <fpage>459</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Wu</surname>
            <given-names>Aiyan</given-names>
          </string-name>
          , Zhang Yongmei, Yang Shang. (
          <year>2022</year>
          ).
          <article-title>A Method for Scientific Cultivation Analysis Based on Knowledge Graphs</article-title>
          , in: 12th International Conference on Electronics,
          <source>Communications and Networks</source>
          ,
          <string-name>
            <surname>CECNet</surname>
          </string-name>
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>