<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Digitization of Konkani Texts, and their Transliteration: An Initiative towards Preservation of a Language Culture</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ms. Palia Tukaram Gaonkar</string-name>
          <email>eng.palia@unigoa.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dr. Andre Rafael Fernandes</string-name>
          <email>rafael@unigoa.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Associate Professor, Department of English, Goa University</institution>
          ,
          <addr-line>Goa</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Doctoral Research Scholar, Department of English, Goa University</institution>
          ,
          <addr-line>Goa</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <fpage>149</fpage>
      <lpage>156</lpage>
      <abstract>
        <p>Konkani is the official language of the state of Goa, located on the western coast of India. This language has faced many political threats such as four hundred and fifty years of Portuguese colonization and contention with Marathi in order to be recognised as the official language, post-Liberation in 1961. It finally entered the Eighth Schedule of the Constitution of India in 1992. These hardships have diversified the nature of Konkani. It is spoken in several dialects in the state of Goa and elsewhere, and is written in five different scripts, owing to the migration of Konkani people from Goa over the centuries. There are Konkani communities in the neighbouring states of Karnataka, Kerala and Maharashtra, which are heavily influenced by the dominant local culture. Hence, Konkani is written in Devanagari, Roman, Kannada, Malayalam and Perso-Arabic scripts. This phenomenon creates a linguistic and literary gap in the community of Konkani speakers. Persistent efforts to bridge the gap have been made, and one of them is by taking the assistance of technology. The World Konkani Centre situated in Mangalore, Karnataka, has developed a transliteration tool, Konkanverter, which transliterates Unicode text between four of the five writing systems of Konkani, namely Devanagari, Roman, Kannada and Malayalam. This paper reports the attempts to digitize, and transliterate an available performance play text in Konkani, from one script to another. It also explores digitization as a way of preserving Konkani texts in multi-script formats.</p>
      </abstract>
      <kwd-group>
        <kwd>Konkani</kwd>
        <kwd>Digitization</kwd>
        <kwd>Transliteration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        www.unigoa.ac.in
The official language of Goa, Konkani, also spoken in pockets of Karnataka, Kerala
and Maharashtra, showed a negative growth rate according to the Census of 2011[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Of the total population of India, 0.19% of the population claimed it to be their mother
tongue; but what is more alarming to notice is that the number of native speakers
decreased from 24,89,015 to 22,56,502, with a growth rate of -9.34% [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In addition to this apprehension, The Telegraph, a noteworthy Indian national daily
published the interview of language expert Professor G N Devy in August 2017 after
he had published several volumes of the People‟s Linguistic Survey of India, wherein
he observed that, “Nearly 400 of India's 850-odd languages face the threat of
extinction because of an erosion of traditional jobs that is fuelling migration to cities,” and
that “the languages spoken in the coastal areas will be the worst-affected” [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Goa has been through the perils of colonization, mechanization, migration and so
forth. In addition, the Konkani language carries a linguistic peculiarity of being
written in five different scripts, with two or more scripts being used in the same
geographical region. The literature created in these five scripts remains restricted to readers
who are familiar with the scripts. Their geographical location decides their readability
of the scripts. Therefore, there will be very few who can read in three scripts, let alone
four or five.</p>
      <p>However, it is remarkable that its cultural and linguistic identity has been
preserved by its people quite fervently, despite the grave threats that appeared time and
again to erase it. The Konkani people carried their language, their deities and their
culture to the places they migrated to, and planted their cultural heritage in alien soil.
However, in a scenario where technological tools greet the end-users in global
languages, a minority language can easily be bypassed. Although close to nine of the
twenty-two scheduled languages recognized by the Constitution of India are featured
by Google, so far, Konkani has not been one of them. However, under Google Indic
Keyboard, Konkani-English interface is provided, with a limited corpus, in Android
smart-phone keyboards.</p>
      <p>If a small language like Konkani is to survive the invasion of technology and its
mission to standardize communication, it needs to adopt technology to preserve its
diversity, and subsequently its culture and heritage. This study aims to explore the
possibility of having a cross-orthographical readership of Konkani using the
transliteration tool known as Konkanverter, which will not only bridge the orthographic
disparity and increase readership and production of literature, but also contribute to the
creation of Konkani digital archive.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Literature Review</title>
      <p>Due to the migration of people from Goa over the centuries, Konkani diasporic
communities exist in Karnataka, Kerala, and Maharashtra. It is not surprising that their
spoken dialect of Konkani carries the flavour of the regional languages in the
respective states, which are Kannada, Malayalam and Marathi respectively. Hence the local
scripts have been adopted by these Konkani communities in the respective regions.</p>
      <p>The two dominant scripts of Konkani in Goa are Devanagari and Roman or Romi
as it is colloquially known. Devanagari orthography is considered the official one;
Romi is used by select weeklies/ magazines and Tiatr (drama) scripts.</p>
      <p>
        Rajan terms this phenomenon of multiple scripts as “synchronic trigraphia” and
goes on to elucidate it as “a major issue of political contention inside the community,
each group favouring the usage of a particular script as the official orthography.
Different orthographic communities exist in isolation with minimal interaction and with
its [sic] own literary tradition, as very few people are fluent in multiple
orthographies” [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Although, these communities may have elementary proficiency in reading in the
other writing system, this multiplicity of scripts creates an obstacle for wider reading.
Hence, literature and language both become less accessible to the native speakers of a
single region: “This disparity in scripts creates intra-lingual barriers which make
writings in Konkani inaccessible even to a native Konkani speaker who has limited or no
knowledge of all these five scripts” [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Rajan suggests that a “statistical machine transliteration engine with reasonable
accuracy would greatly enable cohesion and interaction among the greater linguistic
community. Facilitating the usage of multiple scripts would also encourage more
linguistic diversity among the community” [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Rajan further goes on to present the
development of such a tool, Konkanverter (http://konkanverter.com/) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which has
been extensively used for this study.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>
        The methodology used in this study is exploratory in nature. As Dudovskiy puts it,
“Exploratory research, as the name implies, intends merely to explore the research
questions and does not intend to offer final and conclusive solutions to existing
problems” [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Exploratory research provides insights into a given situation, and produces
qualitative research that becomes the basis of further research in the specified study
area. Exploration involves planning (deciding upon the specific research questions),
exploration (data collection), reflection and analysis (data interpretations and
observations) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>In the context of this study, exploratory research helps discuss benefits, and
challenges that come in the way, of solving the major research question, i.e. whether
digitization can pave a way to language preservation. The present study adheres to this
methodology in a way that is applicable to the discipline. The study serves to explore
the possibilities of using technology to safeguard the linguistic-cultural heritage of a
certain language community, and paves way for further research in this area. The data
and analysis presented in this study are first-hand; revision can be taken up as part of
the next level of research in the same area.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Transliteration of Konkani Play “Shree Vichitrachi Jatra”</title>
      <p>
        World Konkani Centre [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Mangalore, along with computer scientist Vinodh Rajan,
has introduced a “finite state transducer based transliteration engine” [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] known as
Konkanverter. This tool provides transliteration in four popular orthographies of
Konkani, namely Devanagari, Romi, Kannada and Malayalam [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        The present research involved transliteration of a well-known Konkani play “Shree
Vichitrachi Jatra”, written by Pundalik Naik. This play belongs to national award
winning drama collection, Chourang (1982) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The selection of the text was based
on its popularity and critical acclaim; its availability in the digital format also
facilitated its conversion.
      </p>
      <p>
        The digital copy of the play was obtained in Devanagari. This text had lexical
errors in it, although not largely phonetic ones. Furthermore, it was found that the font
used was not in Unicode, and since Konkanverter only accepted fonts in Unicode, an
intermediate software tool known as Baraha FontConvert [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] was used. Since
FontConvert has a limit on the amount of input text, the process had to be undertaken
through several sessions. After converting the font from “Shree-Dev-0709&lt;==&gt;BRH
Devanagari”, the text output in Unicode was then pasted in the left-hand side text
input box in Konkanverter. After clicking on the conversion tab, the transliterated text
was obtained in the right-hand side text output box (See Figure 1). This text was then
pasted into Notepad.
It was observed that the source text font (Shree-Dev) could not be accurately
converted to the Unicode format, and hence some of the letters were wrongly converted.
These errors were carried forward in the transliteration tool as well. Table 1 below
indicates the source text (ST), followed by the text converted into Unicode (UT),
which is further followed by the transliterated text (TT). The underlined error in the
UT is phonetically unreadable (ङ्ख ), which is transferred to the TT. The expected
output is also displayed in the fourth row of the table (see Table 1).
पिषवेंत एक फु टकी कवडी वडयना आनी वयल्यान
ताचीं फकाणाूं करता?
कट्ी , झोळी, पिषवी सगळें एकू च तर तूं ताचे
पिषवेंत एक ङ्खुटकी कवडी वडयना आनी वयल्यान
ताचीं ङ्खकाणाूं करता?
kott'tti, zholli, pixvi sogllem ekuch tor tum
tache pixvent ek fkhuttki kovddi voddoina ani
voilean tachim fkhokannam korta?
Kott'tti, zholli, pixvi sogllem ekuch tor tum
tache pixvent ek futtki kouddi voddoina ani
voilean tachim fokannam korta?
As given in Table 1, one can clearly notice that the addition of „kh‟ syllable happens
at the first stage of conversion, i.e. when the digital source text is converted into
Unicode.
      </p>
      <p>Table 2 explores ten random errors found in the transliterated text, and discusses
the consistency of errors in the output. It shows column-wise progression, from
source text, copied and pasted into font conversion tool (see Table 2 column „Source
Font‟) to receive the Unicode output (see Table 2 column „Unicode Output Text‟).
This Unicode output text was copied and pasted into Konkanverter to give its
transliteration (see Table 2 column „Transliterated Output Text‟). The content of the last
column of Table 2, i.e. „Expected Transliteration‟, is a result of directly typing the
corresponding words of the column „Source Text‟ of Table 2 in Unicode into
Konkanverter input text box, and the output received is not influenced by font conversion
and hence is the expected machine transliteration.
As given in Table 2, one can observe that the letters फ (fa), म (ma) seem to have been
converted erroneously throughout the text; other major erroneous conversions are the
joint half letters such as म्हळ्यार - म्हळार ; गाड्याक - गाडाक; वााट्यचें - वाटाचें. Some seem
to be random errors (UT: आसला, तळु , काणेतरी) which were not repetitive.</p>
      <p>
        The entire play was thus transliterated into Romi, and errors like the above (see
Table 2) had to be dealt with manually. At the time of writing this paper, it is
unknown whether these discrepancies would get transferred into Kannada or
Malayalam, as the present researchers are not familiar with the spoken dialect or the scripts
of these regions. Rajan [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] gives the table for accuracy of the transliteration as
follows:
      </p>
      <p>The above transliteration accuracy rates are to be considered in the case of a text
which is lexically accurate. A limitation of this study was that the source text was not
proofread for spelling errors, which led to rise in the inaccuracy of transliteration.
Therefore, care has to be taken by Konkani language experts to have it perfectly
proofread after digitization, and also have the font converted to Unicode, which will
ensure faithful transliteration.</p>
      <p>For more accuracy to be obtained in translation, ST should be proofread and
converted into Unicode, which will render the intermediate font conversion redundant.
With only one conversion engine between the input and the output text, greater
accuracy will easily be achieved.</p>
      <p>
        Such an analysis of Konkanverter is incomplete without references to Google‟s
contribution to the field of transliteration. Transliteration is featured as one of Google
Input Tools [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which provides on-the-fly options to Roman rendition of a
Devanagari word. However, a Devanagari rendition in Unicode does not get converted to its
Roman equivalent. Hence, transliteration takes place only in one way. Moreover, the
Devanagari script provision is only for Hindi and Marathi languages, Konkani is not
featured in the options. As far as Google Indic Keyboard on Android smart-phones is
concerned, Konkani type font is available in Roman and Devanagari, although
transliteration takes place only from Roman to Devanagari Konkani.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>The acceptance of only the Unicode text limits the usage of Konkanverter, as it
discourages input of any other kind, such as optically recognized characters in general
and voice input. Voice input is further deterred by not having the text on-the-fly.</p>
      <p>Nevertheless, Konkanverter pushes the boundaries of learning of different scripts
of the same language, and makes multi-script archiving possible; it not only bridges
the gap between the two dominant orthographic Konkani communities, but
establishes a kind of digital footprint for a language which could face the risk of becoming
endangered. Such a tool can become an inspiration for minority languages across the
globe to enhance their digital presence, and hence safeguard their identity in the
digital revolution.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Ministry of Home Affairs,
          <source>Government of India</source>
          .
          <year>2013</year>
          . Statement - 7
          <source>Growth Of Scheduled Languages - 1971</source>
          ,
          <year>1981</year>
          ,
          <year>1991</year>
          ,
          <year>2001</year>
          and 2011.Office of the Registrar General &amp; Census
          <string-name>
            <surname>Commissioner</surname>
          </string-name>
          , India. http://www.censusindia.gov.in/2011Census/Language_MTs.html,
          <source>last accessed</source>
          <year>2018</year>
          /10/10.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Mohanty</surname>
            ,
            <given-names>B. K.</given-names>
          </string-name>
          :
          <article-title>Extinction alert on 400 languages. The Telegraph: Online edition</article-title>
          . https://www.telegraphindia.com/india/extinction-alert-on-
          <volume>400</volume>
          languages/cid/1521283, last accessed
          <year>2018</year>
          /10/05.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Rajan</surname>
          </string-name>
          , V.:
          <string-name>
            <surname>Konkanverter - A Finite State</surname>
          </string-name>
          <article-title>Transducer based Statistical Machine Transliteration Engine Konkani Language</article-title>
          .
          <source>Proceedings of the 5th Workshop on South and Southeast Asian NLP, 25th International Conference on Computational Linguistics</source>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>19</lpage>
          .
          <string-name>
            <surname>Ireland</surname>
          </string-name>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gaonkar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandes</surname>
            <given-names>A. R.:</given-names>
          </string-name>
          <article-title>ICT in Language Teaching Analysis of Select Software</article-title>
          .
          <source>Proceedings of the International Conference on Trends and Innovations in Language Teaching</source>
          , pp .
          <fpage>221</fpage>
          . Sathyabama University, Chennai (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Konkanverter</surname>
          </string-name>
          .
          <source>World Konkani Centre, Version</source>
          <volume>2</volume>
          .0. World Konkani Centre Mangalore. http://konkanverter.com/,
          <source>last accessed</source>
          <year>2014</year>
          /12/18.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dudovskiy</surname>
          </string-name>
          , J.: Exploratory Research. Research Methodology. https://researchmethodology.net/research methodology/research-design/exploratory-research/#_ftn2,
          <source>last accessed</source>
          <year>2018</year>
          /10/02
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rebolledo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>A Handbook for Exploratory Action Research</article-title>
          . British
          <string-name>
            <surname>Council</surname>
          </string-name>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. World Konkani Centre. World Konkani Centre Mangalore. http://vishwakonkani.org/,
          <source>last accessed</source>
          <year>2018</year>
          /10/19
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Naik</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Chourang. Apurbai Prakashan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Goa</surname>
          </string-name>
          (
          <year>1982</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>FontConvert. Baraha Indian Language Software. Baraha Software</surname>
          </string-name>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Google Input Tools. Google. https://www.google.com/inputtools/try/,
          <source>last accessed</source>
          <year>2019</year>
          /02/01.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>