Testing Tools for Writing and Publishing E-dictionaries
Tetiana Anokhinaa, Iryna Kobyakovab and Svitlana Shvachkob


a
    National Dragomanov Pedagogical University, Turgenivska 8/14, Kyiv, 01054, Ukraine
b
    Sumy State University, Rimsky-Korsakov, 2, 40000, Ukraine


                 Abstract
                 Building the new electronic community to serve our new needs of using linguistic data, we continue to
                 look for better ways of storing informational resources and smarter ways to use localization materials,
                 software profiles, medical instructions or legal documentation. We have entered the new digital era
                 where it is needful for applied linguists and translators to acquire the needful skills for writing and
                 publishing their e-dictionaries in order to use with other compatible systems. In order to work with
                 data flow we are going to test the publishing systems, which are likely to help translators to compile
                 their personal dictionaries based on e-texts or corpora. The idea of testing the open source systems has
                 a wide perspective for applied linguists and translators. This skill will enable them to develop their
                 HTML schema or work with available templates without changing the available code while making
                 their e-dictionaries.

                 Keywords 1
                 Lexonomy, html schema, a simple dictionary, publishing on the web


1. Introduction

    The reason why we are aiming at testing e-dictionaries is the new digital era we have entered.
Therefore, we continue to build the new electronic community to serve our new needs of using and
storing linguistic data, we continue to look for better ways of learning foreign languages and better
and smarter ways to translate various documents (localization materials, software profiles, medical
instructions or legal documentation, etc.) without wasting time on the current translation bearing in
mind the previous elaborated versions, stored into our translation memory.
    The idea of creating huge and small dictionaries was formulated by the European lexicographic
tradition. Having history from the 16th century, the arrangement of vocabularies, they didn’t know yet
what the stench of lexicography was, they rooted out pre-existing materials, reconsidered, they saw
them in the past, and they themselves gave rise to papery vocabularies and digital analogs [13].
    From the 16th century onwards, dictionary writers, who did not yet know they were lexicographers.
All this time the publishers and printers helped to publish and innovate paper dictionaries and what
we call “their digital counterparts of today” [13]. Some legacy dictionaries, great encyclopedic
dictionaries and other old books were an enormous source of knowledge. From time to time writers
tried to give their opinions, which were the signs of the worldview of that time.
    Dictionaries contain lexical and encyclopedic information and provide a window into the past.
What is important is that they contain the needful information of the language change. We have the

COLINS-2021: 5th International Conference on Computational Linguistics and Intelligent Systems, April 22–23, 2021, Kharkiv, Ukraine
EMAIL: anokhina_mail@yahoo.com (T. Anokhina); kobyakova@ukr.net (I. Kobyakova); s.shvachko@ifsk.sumdu.edu.ua (S. Shvachko)
ORCID: 0000-0002-8859-5568 (T. Anokhina); 0000-0002-9505-2502 (I. Kobyakova); 0000-0002-2119-1884 (S. Shvachko)
              ©️ 2021 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
access to the European heritage in the digital era. Such resources as BNC [1] and the electronic corpus-
based platform SketchEngine provide information about dynamic change of the languages and give
the opportunity to study the online resources that can be accessible for free and on the pre-paid basis.
Such projects as LandLex (legacy dictionaries) and Lexonomy (platform compatible with
SketchEngine that enables compiling personal dictionaries) [4, 6].
    The corpora can give a wide range of opportunities to go into deeper analysis of key words, key
words in the context and add huge linguistic date to your own dictionary automatically. It is the time
when all paper books are no longer in need. The information must be searchable online and the new
dictionaries may be shared and demonstrated in the web. So we are letting old books rest in peace and
go to new e-dictionaries era which is very demanding for the visualization of your linguistic data. The
nowadays linguists are to be ready to format and make their dictionaries public. They are ready for
experiments in compiling their dictionaries in order to find better format for their informational
content.
    The applied linguists work with data sets and acquire the needful skills for tools for writing and
publishing their e-dictionaries. In order to work with data flow they are to test the publishing systems,
which are likely to help them to compile dictionaries based on e-texts or corpora. The idea of testing
the open source systems has a wide perspective for students and it will enable them to develop their
HTML schema or working with available templates without changing the available code while making
their e-dictionaries [6].
    Understanding today’s needs, the systems of electronic dictionaries are developing and the new
products show progress of their experiment. In addition, it is popular to use open-source dictionaries
in order to make more and more people get used with encoding formats and other technical
information that can be given and demonstrated in the most computationally accessible form. The new
look at the linguistics data and electronic template flexibility allows us to make entries adding them
to the existing HTML schema which can be shorter or longer or rearranged [4].
    Today we must think about the variety of codes and languages to use for the data flow, in our case
for compiling linguistic data into electronic dictionaries. We are living our lives in the age where
paper formats are no longer in need. It all concerns the era of digitalization and the focus on the
distance learning and electronic tools to make our life stepping the higher level. The advancement we
are making may be demonstrated on screen while we are uploading the lecture material on applied
linguistics using available resources [1, 10, 11, 15].
    The new activity, which we are aiming at large scope of linguistics, studies including diachronical
studies of the languages. As corpora gives access now to the electronic data of the historical heritage,
the applied linguists can do modern and historical dictionaries, including thesaurus and etymological
features if they wish to.
    What is good about electronic format is that it can be easily transformed into paper format. As far
as Latex tool, it gives the opportunity to prepare your own dictionary using the existing templates. In
addition, LandLex has developed a new electronic vocabulary E-Lex, which can be integrated into the
SketchEngine environment and can be integrated based on corpus dictionaries [1, 2, 4].
    The electronic vocabulary E-Lex is the result of sensitive robotic programs, which demonstrated
new ways of processing corpus data, which can help you develop new electronic vocabularies.
Developed by help of programmers, it can be used by the wide range of users.
    Today is a good day for new ways of handling data that can be helping in developing new e-
dictionaries. As we approaching the electronic era as we serve better solutions in our lexicographical
practice, notably in handling issues as flora and fauna. The cross-linguistic diachronic analyses carried
out within electronic dictionaries have allowed us to experiment with new models such as European
Roots. It has allowed us to develop new analytical methods for looking at image data using CAQDAS,
and then full digitalization using TEI XML. Multilingual lexicographical prototypes are offered as
means of making lexical variation apparent, of making information available without imposing a
single language as a hub [13].

2. Electronic dictionaries: insights into the problem of creating e-dictionary

   The problem of increasing the possibilities of electronic lexicography is becoming relevant.
Applied languages may appear, as you can use electronic dictionaries, you can create your own
vocabularies on the basis of existing tools as for an independent test, so that you can use online, you
can use provide access to your resource.
   We are constantly learning new words and the personal e-dictionary is just a new way of storing
your information. It is your small or huge electronic database, electronic dictionary in Lexonomy or
your Latex dictionary (electronic pre-formatted tex version ready to pdf printing). It is possible to
make your own schema based on what you plan from your dictionary. You are to find and use the tool
you need at the current moment and compile the modern electronic resource.
   Electronic dictionaries can be created manually and on the basis of various corpora. In the prepaid
version of SketchEngine there is an opportunity to generate e-dictionary automatically. As it is
associated with a sketch grammar that allows verbs’ word sketches to be arranged by argument
structure in Sketch Engine. The word sketch ideally it would comprise million-word corpora that have
been PoS tagged in recent years [7], also additional corpora, huge and small are creating from scratch
[8].
   The information is mapped onto the entry template, which is arranged by argument structure. As
template is ready to use we go to Lexonomy page (https://www.lexonomy.eu/), a free dictionary
writing software closely connected with Sketch Engine.
   Lexonomy allows users easily to edit entry templates that can be auto-populated with information
from a corpus hosted on Sketch Engine [4].
   Lexonomy’s out-of-the-box configuration allows language learners to scroll up your examples
from a Sketch Engine corpora from individual example slots in each entry. This practice requires
lexicographers to manually select and add the examples to the entries, which is time consuming. We
might find illustrations from the available corpora and send them to the entries slots, which may be
the good option. Still, if we need some help in advancing our dictionary, the assistance of the Sketch
Engine team will be ready for the reasonable fee.

2.1.    The European lexicographical tradition
   Large and small dictionaries are works that have formed the European lexicographical tradition and become
a source of knowledge and inspiration. Beginning in the 16th century, dictionaries who did not yet know they
were lexicographers processed reference materials, considered, published, and innovated, and thus spawned
paper dictionaries and their modern digital counterparts [13].
   The problem of studying the possibilities of electronic lexicography remains relevant. Applied linguists
must learn how to use electronic dictionaries, create their own dictionaries on the basis of existing tools both
for independent use and with the possibility of location on the Internet, with access to the work of a translator
who can access the dictionary in a cloud environment.

2.2.    The translation systems compatible with e-dictionaries
    Modern translators work with electronic databases of linguistic data, acquiring the necessary knowledge
and skills, mastering new tools for compiling and publishing their electronic dictionaries. To work with
electronic data, future professionals need to understand what a modern electronic dictionary should look like,
test tools for generating electronic dictionaries, and create their own electronic dictionary.
    We refer to translation systems that have such capabilities as the most modern and those that should be
useful to a freelance or team translator. In the first stage, translators should test electronic systems, which are
likely to help them compile dictionaries based on electronic texts or corpora. The idea of testing open source
systems has a wide perspective for students, and this will allow them to develop their own html-scheme or
work with existing templates without changing the existing code when creating their electronic dictionaries.
    The purpose of our investigation is to describe the results of testing electronic lexicography systems that
can be integrated into the corpus environment or compatible with machine translation systems, the popularity
of which is gaining momentum. The required skills will be useful for creating glossaries of various structures,
which allows their further use for each specific project.
    Basic skills include the ability to create electronic dictionaries based on existing templates and diagrams,
edit the e-dictionary, the ability to add new terms and the use of additional dictionaries in machine translation
systems such as Wordfast and Trados, which are located in a cloud environment. Basic skills also include the
acquired skills to upload your dictionary to exchange terminology with other translators, colleagues,
freelancers. The cloud environment itself allows the accumulation of large terminological databases that can
be created and corrected instantly, and the changes will be saved on the cloud.

3. The ABBYY Lingvo family of electronic dictionaries
    The most well-known electronic dictionaries include translation systems by ABBYY LINGVO. These
dictionaries are designed for both desktop and mobile devices, which is an indispensable offline help for the
modern translator. The ABBYY Lingvo dictionary will be useful for those who translate or teach English,
German, Spanish, French and other languages. The professional version includes all available thematic
dictionaries on economics, law, medicine, oil refining, mechanical engineering, etc. The ABBYY Lingvo
family of electronic dictionaries contains additional built-in authentic monolingual dictionaries (e.g. Collins
Cobuild) with up-to-date English vocabulary.
    The system helps to get high-quality and instant translation, which is expected by most modern users. All
you have to do is hover over an as yet unknown word in a letter, movie subtitle or pdf file. From the translation
window, you can quickly add the word to the application, as well as view the transcription or listen if you wish.
It is important that the application helps to memorize new words.
    You can now expand your vocabulary with the universal ABBYY Lingvo Tutor program. The user can enter
the meaning of an unknown word (e.g. Eng. bully), enter its translation (e.g. укр. булер), as well as regularly
replenish their electronic dictionary with other terminological and neological units. To make this process more
efficient, the application contains ready-made dictionaries of basic vocabulary for English, German, French,
Spanish, Italian, and Portuguese. The lexical items in these electronic dictionaries are presented by frequency
of use and thematically sorted (business vocabulary, weather, authenticity, etc.).
    In today’s world, where there is constant access to the Internet, it is indispensable to use vocabularies located
in cloud environments. Multitran’s cloud dictionary with the ability to enter your own terms can be useful for
translators who use the translation base to check the accuracy of the translation of terms from different subject
groups.
    The modern world forces us to look at lexicographic data in a new way. Lexicographers are experimenting
with new ways of presenting information. We find Lexonomy a very useful tool. As Michal Měchura states:
        Lexonomy is a web-based platform for writing and publishing dictionaries. Its mission is to be an easy-to-use tool for small to medium-sized
        dictionary projects. In Lexonomy, individuals and teams can create a dictionary, design an arbitrary XML structure for the entries, edit entries,
        and eventually make the dictionary publicly available as a ‘microsite’ within the Lexonomy website. Lexonomy exists in order to lower the
        barriers of entry into modern born-digital lexicography.1 Compared to other dictionary writing systems and it requires no installation or set-
        up, expects no knowledge of coding or programming, and is free from financial cost. It is simply a website where lexicographers can sign up
        and start working [6].
   Reversing a bilingual dictionary, as it has already been discussed by Michal Měchura and other scholars [5],
we follow the tendency to work with electronic lexicography [4, 6]. However, it will be interesting to follow
the process of compiling a simple bilingual dictionary, just gradually. In the figure 1, we show how entries are
added from the existing template (the Figure 1).
Figure 1: Lexonomy platform: entry

   During the compilation of the dictionary, we are adding the translation into the sense area. The manual work
takes time, but the result will be satisfying. If you are using SketchEngine tool the speed of compiling your
dictionary will be higher at least at the first step – entering new entries. What is good about the automatically
compiled entries, we do not need to do too much manual work. We may set the number of entries and the
source, then the system will add entries from SketchEngine database (which is really huge) to your Lexonomy
dictionary (the Figure 2).


Figure 2: Lexonomy platform: English-Ukrainian pairs

   The perfect template that was elaborated by the team of programmers makes your dictionary look really
astonishing. No paper dictionaries can be compared with pre-formatted and designed electronic dictionary (the
Figure 3).
Figure 3: Lexonomy platform: English-Ukrainian pairs

   The customer can format the existing template using the embedded option. Step by step your dictionary is
growing (the Figure 4).


Figure 3: Lexonomy dictionary

   As we add new words, they are listed in an alphabetical order and there is an option to add the transcription
pattern, which can be useful for learning EFL. The headword is the word we are entering, then we can select
the part of speech and sense. The sense can be used for making a monolingual dictionary, then all the sense
explanation goes in English. Alternatively, we may use the sense explanation to give translation in Ukrainian.
The second option needs more time but we find it more useful for translation studies.


Figure 5: Lexonomy dictionary makeover

        The making it over you can add different color for the interface of your vocabulary in order to satisfy
your taste (the Figure 5).

4. Electronic data for creating the corpus-based registry of the Lacunicon Syncet

    The idea to create a dictionary of lacunae has been implemented by the providing the corpus-based
study of lacunae and crating the register of the Lacunicon – the Lacunicon Syncet. As the Lacunicon
Syncet is the register of lacunae, it was important to classify (taxonomic approach), model and mind
map (cognitive approach), make a procedure chart in the corpus-based Lacunicon Syncet organization
(the corpus linguistics approach). So we have compiled the terms and illustrations into three main
endozones (language-semiotic, communicative-translational and cognitive-synergetic endozones),
which have cross reference in five clusters, namely (paradigmatic, syntagmatic, panchronistic,
cultural, cognitive clusters of lacunicon register).
    The Syncet of Lacunicon (the semantic synonyms register) has slots whish are presented by shell
words and context free words of Lacunicon, which has 5 facets clustering and 3 basic endozones. In
the Syncet of Lacunicon register, there are alphanumeric sorting, concordance show and glossary
illustrations in the manner those terms of lacunology are illustrated vividly in COCA, BNC and
WordNet [1, 2, 15].

4.1.    The Lacunicon Syncet design, ways of its verification
    The Lacunicon Syncet was designed in order to show the register of lacunae in the way of a corpus-based
library. The lacunae were verified and selected manually to be put into three main endozones by the cognitive
scenario Language-Speech-Cognition. The slots for shell words (e. g. dried up plurality) were illustrated by
context free words (e. g. scissors).
    We have implemented the Soundex algorithm [9] by Roget’ Thesarus among other approaches to achieve
greater variation of Lacunicon Syncet. The idea of building the semantic dictionary as WordNet we are
considering as the vivid and open source method. In addition, the Latex templates may be useful to compile a
small staring register of the Lacunicon Syncet.

4.2.    Dataset of Lacunicon register: main stages
   The data comprised the Lacunicon Syncet register created by several stages. The first stage of Lacunicon
Syncet register is the procedure scheme comprising main lexicon of the Lacunicon Syncet (i.e. the developing
working term) by terms and descriptors: making of the Lacunicon Syncet in slots (for shell words and context
free words of the Lacunicon register), POS (parts of speech), ABC (the alphabetical sorting), alphanumeric
sort, clustering, endozones classification, concordance show, gloss example (the Figure 6).
   The Lacunae Model is in the mind map Language-Speech-Cognition as LACUNA-GAP-ABSENCE. The
second stage of presenting the core of the Lacunicon register is the lexicographical data resulting in syncet of
lacunae as shell words as H. Schmidt proposed to name abstract words of lacunicon. The third stage of the
corpus-based Lacunicon is verification of the basic terms in ISO databases [10], COCA [2], BNC [1]. The fourth
stage of Lacunicon Syncet analysis as the terms of the corpus-based lacunicon register showed the Semantic
similarity of Lacunicon Syncet and the basic terms of the lacunicon register are illustrated by frequency rate.


Figure 6: The Lacunicon Syncet
   As the Figure 7 shows, the Latex (LaTex system for document preparation with Tex distribution) template
has enabled us to prepare the preformatted document relating to the Lacunicon project comprising lacunar
terms built on the existing schema, provided by the Latex online Overleaf project.


   Figure 7: The Latex Dictionary
5. The Academic implications
   This study is the academic research, which has a perspective plan and can be implemented in the further
studies of translation studies (e-dictionaries), corpus-based analysis (applied linguistic studies), and finding
better ways to format and show off your dictionaries online, also we seek for better ways to eliminate lacunae
(as unknown words) in texts, corpora for making better taxonomies, dictionaries and open internet libraries for
linguists and scholars willing to participate in the project Lacunae Gloss and Lacuna Syncet.
   The further analysis will be directed towards creating Lacunicon Glossary using F. Mittelbach [7] approach
to TEX procedures (also, using Latex templates); upbuilding the Lacunicon Syncet register; creating a database
of Lacunicon Syncet library.

6. Discussion section
   Modern translators and applied linguists are to be ready to work with different linguistic data and prepare
electronic dictionaries to use and share. In order to give insights into the grounds of electronic lexicography we
were aiming at having given the outline of new possibilities and modern tools. Thus, we are working on with
electronic libraries having given a smooth start to test e-dictionaries from Lexonomy family and Latex templates
in order to build and properly format dictionaries using existing templates and accessible libraries. In term of
their professional work as translators, the ability to create the personal dictionary will help to use corpus
possibilities in creating their personal libraries of translation memory files and specialized dictionaries
compatible with translation systems such as Wordfast and Trados.

7. Conclusions section
   The linguistic profession tends to be a bit updated in order to understand that coding can be helping in
creating personal templates and libraries. We are concluding that electronic lexicography is our future. The
ability to try and create a e-dictionary is a needful skill to add to resumes of professional translators and applied
linguists. In our study we have showed how easy steps will lead to building the personal portfolio that must be
electronic, comparative and beautifully formatted. As for the perspective, our students, the applied linguists
will be able to demonstrate the success in the integration of linguistic data with help of various different tools
and formal languages. Also, in the future we do hope they will learn how Latex environment cooperates with
Python. We do hope to create a system of compatible format including tex, tmx, txt, pdf and other formats for
cloud instruments to satisfy the needs of would be professionals.

8. Acknowledgements
   We would like to thank my co-workers and colleagues for their constructive comments and suggestions.
The research was partly funded by Sumy State University and National Dragomanov Pedagogical University.

9. References
[1] British National Corpus. URL: http: //www.mova.info/corpus.aspx
[2] Corpus of Contemporary American English. URL: http: //www.corpus.byu.edu/coca/
[3] LaTeX Templates. URL: https://www.latextemplates.com/
[4] M. Měchura. Gentle introduction to Lexonomy, 2021. URL: http: //https://www.lexonomy.eu/docs/intro
[5] Maks, E. OMBI: the practice of reversing dictionaries. In International Journal of Lexicography, 20(3).
    2007: 259-274. URL: http://doi: 10.1093/ijl/ecm028
[6] Měchura, Michel. B.‘Introducing Lexonomy: an open-source dictionary writing and publishing system’
    in Electronic Lexicography in the 21st Century: Lexicography from Scratch. Proceedings of the eLex 2017
    conference, 19-21 September 2017, Leiden, The Netherlands: 2017: 662-679.
[7] Mittelbach, Frank, Carlisle, David, Rowley, Chris. The LaTEX3 Programming Language – a proposed
     system for TEX macro programming. TUGboat, Volume 18 . No. 4. 1997: 303-308.
[8] Meelen, Marieke, Roux, Élie and Hill, Nathan. 2021. Optimisation of the Largest Annotated Tibetan
     Corpus Combining Rule-based, Memory-based, and Deep-learning Methods. ACM Trans. Asian Low-
     Resour. Lang. Inf.Process. 20, 1, Article 7 (March 2021), 11 pages. URL: https://doi.org/10.1145/3409488
[9] Soundex algorithm by Roget’ Thesarus. URL: http://www.roget.org/mixed.htm
[10] The ISO Concept Database URL: ISO. https://www.iso.org/obp/ui/
[11] The Ukrainian language corpus. URL: http: //www.natcorp.ox.ac.uk
[12] Viks, Ü. Eesti-X-keele sõnastik ja grammatika. [Estonian-X dictionary and grammar.] In Eesti
     Rakenduslingvistika aastaraamat, 4, 247−261. Wang, Xin, Tapani Ahonen, and Jari Nurmi. "Applying
     CDMA technique to network-on-chip." IEEE transactions on very large scale integration (VLSI)
     systems 15.10 2007: 1091-1100.
[13] Villalva, Alina, Williams, Geoffrey. LandLex: the description of lexical variation through in-depth case
     studies, in: The landscape of lexicography. Alina Villalva & Geoffrey Williams (eds.) Centro de
     Linguística da Universidade de Lisboa Centro de Línguas, Literaturas e Culturas da Universidade de
     Aveiro: 2019:15-25.
[14] WordNet 3.1 Online version. URL: http://wordnetweb.princeton.edu/perl