<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Java2Pseudo: Java to Pseudo Code Translator a Pilot Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Heetae Cho</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Seonah Lee</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of AI Convergence Engineering, Gyeongsang National University</institution>
          ,
          <addr-line>501 Jinju-daero, Jinju-si, Gyeongsangnam-do</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Aerospace and Software Engineering, Gyeongsang National University</institution>
          ,
          <addr-line>501 Jinju-daero, Jinju-si, Gyeongsangnam-do</addr-line>
          ,
          <country>Republic of Korea</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Novice programmers may not quickly understand a new or unfamiliar programming language. In order to help them understand the source code, existing studies have proposed approaches that translate a programming language into a pseudo-code. However, to the best of our knowledge, no studies have proposed translating Java, one of the well-known object-oriented programming languages, to pseudo-code. Many novice programmers learn the Java language because it is relatively less complicated than C++ and versatile in terms of practical use. Furthermore, the educational curriculum includes a Java course frequently. In this paper, we propose an approach to translate Java into pseudo-code at the code fragment level. We expect the proposed approach could help novice programmers to learn the Java language with the translated pseudo-code.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Java</kwd>
        <kwd>Program Comprehension</kwd>
        <kwd>Pseudo-code</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>In this paper, we propose a translation approach</title>
        <p>to line-by-line translating Java source code to
A pseudo-code presents the logic of an algorithm pseudo-code using context for matching code
patin a natural language imitating a programming terns and a template for translating matched
patlanguage. The pseudo-code forms of natural lan- terns to pseudo-code. This approach does not need
guage are more readable and understandable than the efort to gather data like machine learning
techinterpreting the programming language directly. niques.</p>
        <p>Therefore, novice programmers could easily un- This paper is organized as follows. Section 2
derstand the meaning of the actual source code describes related works for our study. Section 3
through pseudo-code. introduces our approach. Section 4 demonstrates</p>
        <p>
          For this reason, several studies have proposed the demo results. Section 5 describes the discussion.
approaches to translating source code to pseudo- Finally, section 6 concludes this paper.
code [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7 ref8">1, 2, 3, 4, 5, 6, 7, 8</xref>
          ]. However, no studies
were translating Java source code to pseudo-code.
        </p>
        <p>
          Furthermore, most studies used machine learning 2. Related Works
techniques [
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref6 ref8">1, 2, 3, 4, 6, 8</xref>
          ], which require much
efort to gather pseudo-code data corresponding to
source code.
        </p>
        <p>Although machine learning techniques are
strong forward, we hypothesize the programming
language is formal language enough to generate
pseudo-code using rules like a compiler process.</p>
      </sec>
      <sec id="sec-1-2">
        <title>While existing studies translate source code to pseudo-code, most studies [1, 2, 3, 4, 6, 8] targeted Python source code. These studies also used machine-learning techniques.</title>
        <p>
          Oda et al.[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] used statistical machine translation
techniques that parse-based machine translation
and tree-to-string machine translation to generate
pseudo-code from python source code. Xu et al.[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
and Alhefdhi et al.[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] used sequence-to-sequence
and attention approaches. They encode python
source code with LSTM encoder and decode it to
pseudo-code with LSTM decoder. Yang et al.[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
used CNN and transformer architecture[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. They sequence-first match
ifrst extract code features using the CNN model. • step-5) Generates pseudo-code of the
Then the extracted features are fed to the trans- matched patterns through pre-defined
former architecture to generate pseudo-code. Gad pseudo-code template
et al.[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and Alokla et al.[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] also used transformer
architecture. They first tokenize source code with Where the samples of pre-defined symbols,
pattheir own rules, and vectorize them and apply posi- terns, and templates are shown in Figures 1 - 3
retion embedding. Then the embedded source codes spectively. First, the pre-defined symbols, as shown
are fed to the transformer architecture to generate in Figure 1, represent the roles of the code tokens
pseudo-code. and assign an ID for each token. Second, the
pat
        </p>
        <p>These studies treated source code as natural lan- terns shown in Figure 2 express a unique sequence
guage. However, their approaches have a potential of symbols and IDs. Finally, the templates, as shown
shortcoming in that such an approach regards dis- in Figure 3, describe each pattern.
tinct contexts despite the same ones with diferent
identifiers. For instance, from a pseudo-code per- 4. Demonstration
spective, the contexts "int a" and "int b" are equally
treated as variable declarations. Treated as natu- 4.1. Approach Demonstration
ral language, however, diferent vectors are created
because of the diferent words ’a’ and ’b’.</p>
        <p>
          Two study used contexts and templates similar
to our study [
          <xref ref-type="bibr" rid="ref5 ref7">5, 7</xref>
          ]. However, they target JavaScript
and Python which has diferent contexts of a code
line to Java.
        </p>
        <p>To demonstrate our approach, we first depict
following the steps. For step 1, as shown in Figure
4, we remove comments and blank lines from the
entered source code and tokenize them. For step 2,
shown in Figure 5, we replace all the code tokens
with pre-defined symbols. For step 3, we replace
the symbols with unique IDs, as shown in Figure
3. Approach 6. For step 4, as shown in Figure 7, we use the
predefined patterns to find the longest pattern from
Our approach, inspired by a compiler, to translating left to right in the symbol ID sequence, and if found,
Java source code to pseudo-code is as follows: iteratively search from the following ID. If the
pattern is not found, leave the symbol ID and search
again from the following ID. Finally, as shown in
Figure 8, we generate the pseudo-code with the
pre-defined templates.
• step-1) Remove all comments and blank lines</p>
        <p>and tokenizes the source code
• step-2) Changes code tokens to pre-defined
symbols and keeps the original code tokens
of each symbol
• step-3) Changes the symbols to specific sym- 4.2. Prototype</p>
        <p>bols by checking the original tokens We implemented a prototype of our translation
ap• step-4) Finds patterns line-by-line from pre- proach as a web. Figures 9-11 show the
demonstradefined patterns through the longest symbol tion of our approach. Figure 9 shows the textarea
for inputting the Java source code. When click- sequence patterns based on pre-defined symbol
ing the translate button, the inputted source code patterns. For example, the symbolic code above
changes to symbolic code through the approach’s changes to [’(id id =)’, ’(keyword id)’, ’( ( id . id )’ ’(
steps 1 to 4. For example, source code line “Scan- ) )’] which as pre-defined symbol sequence. Finally,
ner scan = new Scanner (System.in);” is changed pseudo-code generates by extracting pseudo-code
to “##Identifier## ##Identifier## = ##Keyword## for each pattern from a pre-defined pseudo-code
##Identifier## ( ##Identifier## . ##Identifier ## ) template, as shown in figure 11. For example, the
;”, as shown in Figure 10. Then, the symbols are (id id =) changed to “create Scanner type object
transformed into specific symbols. After that, By variable input and assign with”, the (keyword id)
using the longest symbol sequence-first match, our changed to “Scanner object”, and (id . id) changed
approach replaces the symbolic code with symbol to “with parameters ( System . in )”.
- method call lines (17.1%) (e.g., method();)
- package/import lines (16.0%) (e.g., import
java.util.*;)
- variable definition lines (7.6%) (e.g., int a=1; except
object)
- method definition lines (6.6%) (e.g., int method () )
- if or else-if lines (5.5%)</p>
      </sec>
      <sec id="sec-1-3">
        <title>This result shows that the specific symbol sequences can identify the roles of each line.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>6. Conclusion</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Oda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fudaba</surname>
          </string-name>
          , G. Neubig,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sakti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Toda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nakamura</surname>
          </string-name>
          ,
          <article-title>Learning to generate pseudo-code from source code using statistical machine translation</article-title>
          ,
          <source>in: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE)</source>
          , IEEE,
          <year>2015</year>
          , pp.
          <fpage>574</fpage>
          -
          <lpage>584</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>Automatic generation of pseudocode with attention seq2seq model</article-title>
          ,
          <source>in: 2018 25th Asia-Pacific Software Engineering Conference (APSEC)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>711</fpage>
          -
          <lpage>712</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Fine-grained pseudo-code generation method via code feature extraction and transformer</article-title>
          ,
          <source>in: 2021 28th Asia-Pacific Software Engineering Conference (APSEC)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Gad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alokla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nazih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aref</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Salem</surname>
          </string-name>
          ,
          <article-title>Dlbt: deep learning-based transformer to generate pseudo-code from source code</article-title>
          ,
          <source>CmcComput. Mater. Contin</source>
          <volume>70</volume>
          (
          <year>2022</year>
          )
          <fpage>3117</fpage>
          -
          <lpage>3132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barmpoutis</surname>
          </string-name>
          ,
          <article-title>Learning programming languages as shortcuts to natural language token replacements</article-title>
          ,
          <source>in: Proceedings of the 18th Koli Calling International Conference on Computing Education Research</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alhefdhi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. K.</given-names>
            <surname>Dam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghose</surname>
          </string-name>
          ,
          <article-title>Generating pseudo-code from source code using deep learning</article-title>
          ,
          <source>in: 2018 25th Australasian Software Engineering Conference (ASWEC)</source>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Belwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>Is the corpus ready for machine translation? a case study with python to pseudo-code corpus</article-title>
          ,
          <source>Arabian Journal for Science and Engineering</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alokla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nazih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aref</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.- B. Salem</surname>
          </string-name>
          ,
          <article-title>Retrieval-based transformer pseudocode generation</article-title>
          ,
          <source>Mathematics</source>
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>604</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>