Testing Tools for Writing and Publishing E-dictionaries Tetiana Anokhinaa, Iryna Kobyakovab and Svitlana Shvachkob a National Dragomanov Pedagogical University, Turgenivska 8/14, Kyiv, 01054, Ukraine b Sumy State University, Rimsky-Korsakov, 2, 40000, Ukraine Abstract Building the new electronic community to serve our new needs of using linguistic data, we continue to look for better ways of storing informational resources and smarter ways to use localization materials, software profiles, medical instructions or legal documentation. We have entered the new digital era where it is needful for applied linguists and translators to acquire the needful skills for writing and publishing their e-dictionaries in order to use with other compatible systems. In order to work with data flow we are going to test the publishing systems, which are likely to help translators to compile their personal dictionaries based on e-texts or corpora. The idea of testing the open source systems has a wide perspective for applied linguists and translators. This skill will enable them to develop their HTML schema or work with available templates without changing the available code while making their e-dictionaries. Keywords 1 Lexonomy, html schema, a simple dictionary, publishing on the web 1. Introduction The reason why we are aiming at testing e-dictionaries is the new digital era we have entered. Therefore, we continue to build the new electronic community to serve our new needs of using and storing linguistic data, we continue to look for better ways of learning foreign languages and better and smarter ways to translate various documents (localization materials, software profiles, medical instructions or legal documentation, etc.) without wasting time on the current translation bearing in mind the previous elaborated versions, stored into our translation memory. The idea of creating huge and small dictionaries was formulated by the European lexicographic tradition. Having history from the 16th century, the arrangement of vocabularies, they didn’t know yet what the stench of lexicography was, they rooted out pre-existing materials, reconsidered, they saw them in the past, and they themselves gave rise to papery vocabularies and digital analogs [13]. From the 16th century onwards, dictionary writers, who did not yet know they were lexicographers. All this time the publishers and printers helped to publish and innovate paper dictionaries and what we call “their digital counterparts of today” [13]. Some legacy dictionaries, great encyclopedic dictionaries and other old books were an enormous source of knowledge. From time to time writers tried to give their opinions, which were the signs of the worldview of that time. Dictionaries contain lexical and encyclopedic information and provide a window into the past. What is important is that they contain the needful information of the language change. We have the COLINS-2021: 5th International Conference on Computational Linguistics and Intelligent Systems, April 22–23, 2021, Kharkiv, Ukraine EMAIL: anokhina_mail@yahoo.com (T. Anokhina); kobyakova@ukr.net (I. Kobyakova); s.shvachko@ifsk.sumdu.edu.ua (S. Shvachko) ORCID: 0000-0002-8859-5568 (T. Anokhina); 0000-0002-9505-2502 (I. Kobyakova); 0000-0002-2119-1884 (S. Shvachko) ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) access to the European heritage in the digital era. Such resources as BNC [1] and the electronic corpus- based platform SketchEngine provide information about dynamic change of the languages and give the opportunity to study the online resources that can be accessible for free and on the pre-paid basis. Such projects as LandLex (legacy dictionaries) and Lexonomy (platform compatible with SketchEngine that enables compiling personal dictionaries) [4, 6]. The corpora can give a wide range of opportunities to go into deeper analysis of key words, key words in the context and add huge linguistic date to your own dictionary automatically. It is the time when all paper books are no longer in need. The information must be searchable online and the new dictionaries may be shared and demonstrated in the web. So we are letting old books rest in peace and go to new e-dictionaries era which is very demanding for the visualization of your linguistic data. The nowadays linguists are to be ready to format and make their dictionaries public. They are ready for experiments in compiling their dictionaries in order to find better format for their informational content. The applied linguists work with data sets and acquire the needful skills for tools for writing and publishing their e-dictionaries. In order to work with data flow they are to test the publishing systems, which are likely to help them to compile dictionaries based on e-texts or corpora. The idea of testing the open source systems has a wide perspective for students and it will enable them to develop their HTML schema or working with available templates without changing the available code while making their e-dictionaries [6]. Understanding today’s needs, the systems of electronic dictionaries are developing and the new products show progress of their experiment. In addition, it is popular to use open-source dictionaries in order to make more and more people get used with encoding formats and other technical information that can be given and demonstrated in the most computationally accessible form. The new look at the linguistics data and electronic template flexibility allows us to make entries adding them to the existing HTML schema which can be shorter or longer or rearranged [4]. Today we must think about the variety of codes and languages to use for the data flow, in our case for compiling linguistic data into electronic dictionaries. We are living our lives in the age where paper formats are no longer in need. It all concerns the era of digitalization and the focus on the distance learning and electronic tools to make our life stepping the higher level. The advancement we are making may be demonstrated on screen while we are uploading the lecture material on applied linguistics using available resources [1, 10, 11, 15]. The new activity, which we are aiming at large scope of linguistics, studies including diachronical studies of the languages. As corpora gives access now to the electronic data of the historical heritage, the applied linguists can do modern and historical dictionaries, including thesaurus and etymological features if they wish to. What is good about electronic format is that it can be easily transformed into paper format. As far as Latex tool, it gives the opportunity to prepare your own dictionary using the existing templates. In addition, LandLex has developed a new electronic vocabulary E-Lex, which can be integrated into the SketchEngine environment and can be integrated based on corpus dictionaries [1, 2, 4]. The electronic vocabulary E-Lex is the result of sensitive robotic programs, which demonstrated new ways of processing corpus data, which can help you develop new electronic vocabularies. Developed by help of programmers, it can be used by the wide range of users. Today is a good day for new ways of handling data that can be helping in developing new e- dictionaries. As we approaching the electronic era as we serve better solutions in our lexicographical practice, notably in handling issues as flora and fauna. The cross-linguistic diachronic analyses carried out within electronic dictionaries have allowed us to experiment with new models such as European Roots. It has allowed us to develop new analytical methods for looking at image data using CAQDAS, and then full digitalization using TEI XML. Multilingual lexicographical prototypes are offered as means of making lexical variation apparent, of making information available without imposing a single language as a hub [13]. 2. Electronic dictionaries: insights into the problem of creating e-dictionary The problem of increasing the possibilities of electronic lexicography is becoming relevant. Applied languages may appear, as you can use electronic dictionaries, you can create your own vocabularies on the basis of existing tools as for an independent test, so that you can use online, you can use provide access to your resource. We are constantly learning new words and the personal e-dictionary is just a new way of storing your information. It is your small or huge electronic database, electronic dictionary in Lexonomy or your Latex dictionary (electronic pre-formatted tex version ready to pdf printing). It is possible to make your own schema based on what you plan from your dictionary. You are to find and use the tool you need at the current moment and compile the modern electronic resource. Electronic dictionaries can be created manually and on the basis of various corpora. In the prepaid version of SketchEngine there is an opportunity to generate e-dictionary automatically. As it is associated with a sketch grammar that allows verbs’ word sketches to be arranged by argument structure in Sketch Engine. The word sketch ideally it would comprise million-word corpora that have been PoS tagged in recent years [7], also additional corpora, huge and small are creating from scratch [8]. The information is mapped onto the entry template, which is arranged by argument structure. As template is ready to use we go to Lexonomy page (https://www.lexonomy.eu/), a free dictionary writing software closely connected with Sketch Engine. Lexonomy allows users easily to edit entry templates that can be auto-populated with information from a corpus hosted on Sketch Engine [4]. Lexonomy’s out-of-the-box configuration allows language learners to scroll up your examples from a Sketch Engine corpora from individual example slots in each entry. This practice requires lexicographers to manually select and add the examples to the entries, which is time consuming. We might find illustrations from the available corpora and send them to the entries slots, which may be the good option. Still, if we need some help in advancing our dictionary, the assistance of the Sketch Engine team will be ready for the reasonable fee. 2.1. The European lexicographical tradition Large and small dictionaries are works that have formed the European lexicographical tradition and become a source of knowledge and inspiration. Beginning in the 16th century, dictionaries who did not yet know they were lexicographers processed reference materials, considered, published, and innovated, and thus spawned paper dictionaries and their modern digital counterparts [13]. The problem of studying the possibilities of electronic lexicography remains relevant. Applied linguists must learn how to use electronic dictionaries, create their own dictionaries on the basis of existing tools both for independent use and with the possibility of location on the Internet, with access to the work of a translator who can access the dictionary in a cloud environment. 2.2. The translation systems compatible with e-dictionaries Modern translators work with electronic databases of linguistic data, acquiring the necessary knowledge and skills, mastering new tools for compiling and publishing their electronic dictionaries. To work with electronic data, future professionals need to understand what a modern electronic dictionary should look like, test tools for generating electronic dictionaries, and create their own electronic dictionary. We refer to translation systems that have such capabilities as the most modern and those that should be useful to a freelance or team translator. In the first stage, translators should test electronic systems, which are likely to help them compile dictionaries based on electronic texts or corpora. The idea of testing open source systems has a wide perspective for students, and this will allow them to develop their own html-scheme or work with existing templates without changing the existing code when creating their electronic dictionaries. The purpose of our investigation is to describe the results of testing electronic lexicography systems that can be integrated into the corpus environment or compatible with machine translation systems, the popularity of which is gaining momentum. The required skills will be useful for creating glossaries of various structures, which allows their further use for each specific project. Basic skills include the ability to create electronic dictionaries based on existing templates and diagrams, edit the e-dictionary, the ability to add new terms and the use of additional dictionaries in machine translation systems such as Wordfast and Trados, which are located in a cloud environment. Basic skills also include the acquired skills to upload your dictionary to exchange terminology with other translators, colleagues, freelancers. The cloud environment itself allows the accumulation of large terminological databases that can be created and corrected instantly, and the changes will be saved on the cloud. 3. The ABBYY Lingvo family of electronic dictionaries The most well-known electronic dictionaries include translation systems by ABBYY LINGVO. These dictionaries are designed for both desktop and mobile devices, which is an indispensable offline help for the modern translator. The ABBYY Lingvo dictionary will be useful for those who translate or teach English, German, Spanish, French and other languages. The professional version includes all available thematic dictionaries on economics, law, medicine, oil refining, mechanical engineering, etc. The ABBYY Lingvo family of electronic dictionaries contains additional built-in authentic monolingual dictionaries (e.g. Collins Cobuild) with up-to-date English vocabulary. The system helps to get high-quality and instant translation, which is expected by most modern users. All you have to do is hover over an as yet unknown word in a letter, movie subtitle or pdf file. From the translation window, you can quickly add the word to the application, as well as view the transcription or listen if you wish. It is important that the application helps to memorize new words. You can now expand your vocabulary with the universal ABBYY Lingvo Tutor program. The user can enter the meaning of an unknown word (e.g. Eng. bully), enter its translation (e.g. укр. булер), as well as regularly replenish their electronic dictionary with other terminological and neological units. To make this process more efficient, the application contains ready-made dictionaries of basic vocabulary for English, German, French, Spanish, Italian, and Portuguese. The lexical items in these electronic dictionaries are presented by frequency of use and thematically sorted (business vocabulary, weather, authenticity, etc.). In today’s world, where there is constant access to the Internet, it is indispensable to use vocabularies located in cloud environments. Multitran’s cloud dictionary with the ability to enter your own terms can be useful for translators who use the translation base to check the accuracy of the translation of terms from different subject groups. The modern world forces us to look at lexicographic data in a new way. Lexicographers are experimenting with new ways of presenting information. We find Lexonomy a very useful tool. As Michal Měchura states: Lexonomy is a web-based platform for writing and publishing dictionaries. Its mission is to be an easy-to-use tool for small to medium-sized dictionary projects. In Lexonomy, individuals and teams can create a dictionary, design an arbitrary XML structure for the entries, edit entries, and eventually make the dictionary publicly available as a ‘microsite’ within the Lexonomy website. Lexonomy exists in order to lower the barriers of entry into modern born-digital lexicography.1 Compared to other dictionary writing systems and it requires no installation or set- up, expects no knowledge of coding or programming, and is free from financial cost. It is simply a website where lexicographers can sign up and start working [6]. Reversing a bilingual dictionary, as it has already been discussed by Michal Měchura and other scholars [5], we follow the tendency to work with electronic lexicography [4, 6]. However, it will be interesting to follow the process of compiling a simple bilingual dictionary, just gradually. In the figure 1, we show how entries are added from the existing template (the Figure 1). Figure 1: Lexonomy platform: entry During the compilation of the dictionary, we are adding the translation into the sense area. The manual work takes time, but the result will be satisfying. If you are using SketchEngine tool the speed of compiling your dictionary will be higher at least at the first step – entering new entries. What is good about the automatically compiled entries, we do not need to do too much manual work. We may set the number of entries and the source, then the system will add entries from SketchEngine database (which is really huge) to your Lexonomy dictionary (the Figure 2). Figure 2: Lexonomy platform: English-Ukrainian pairs The perfect template that was elaborated by the team of programmers makes your dictionary look really astonishing. No paper dictionaries can be compared with pre-formatted and designed electronic dictionary (the Figure 3). Figure 3: Lexonomy platform: English-Ukrainian pairs The customer can format the existing template using the embedded option. Step by step your dictionary is growing (the Figure 4). Figure 3: Lexonomy dictionary As we add new words, they are listed in an alphabetical order and there is an option to add the transcription pattern, which can be useful for learning EFL. The headword is the word we are entering, then we can select the part of speech and sense. The sense can be used for making a monolingual dictionary, then all the sense explanation goes in English. Alternatively, we may use the sense explanation to give translation in Ukrainian. The second option needs more time but we find it more useful for translation studies. Figure 5: Lexonomy dictionary makeover The making it over you can add different color for the interface of your vocabulary in order to satisfy your taste (the Figure 5). 4. Electronic data for creating the corpus-based registry of the Lacunicon Syncet The idea to create a dictionary of lacunae has been implemented by the providing the corpus-based study of lacunae and crating the register of the Lacunicon – the Lacunicon Syncet. As the Lacunicon Syncet is the register of lacunae, it was important to classify (taxonomic approach), model and mind map (cognitive approach), make a procedure chart in the corpus-based Lacunicon Syncet organization (the corpus linguistics approach). So we have compiled the terms and illustrations into three main endozones (language-semiotic, communicative-translational and cognitive-synergetic endozones), which have cross reference in five clusters, namely (paradigmatic, syntagmatic, panchronistic, cultural, cognitive clusters of lacunicon register). The Syncet of Lacunicon (the semantic synonyms register) has slots whish are presented by shell words and context free words of Lacunicon, which has 5 facets clustering and 3 basic endozones. In the Syncet of Lacunicon register, there are alphanumeric sorting, concordance show and glossary illustrations in the manner those terms of lacunology are illustrated vividly in COCA, BNC and WordNet [1, 2, 15]. 4.1. The Lacunicon Syncet design, ways of its verification The Lacunicon Syncet was designed in order to show the register of lacunae in the way of a corpus-based library. The lacunae were verified and selected manually to be put into three main endozones by the cognitive scenario Language-Speech-Cognition. The slots for shell words (e. g. dried up plurality) were illustrated by context free words (e. g. scissors). We have implemented the Soundex algorithm [9] by Roget’ Thesarus among other approaches to achieve greater variation of Lacunicon Syncet. The idea of building the semantic dictionary as WordNet we are considering as the vivid and open source method. In addition, the Latex templates may be useful to compile a small staring register of the Lacunicon Syncet. 4.2. Dataset of Lacunicon register: main stages The data comprised the Lacunicon Syncet register created by several stages. The first stage of Lacunicon Syncet register is the procedure scheme comprising main lexicon of the Lacunicon Syncet (i.e. the developing working term) by terms and descriptors: making of the Lacunicon Syncet in slots (for shell words and context free words of the Lacunicon register), POS (parts of speech), ABC (the alphabetical sorting), alphanumeric sort, clustering, endozones classification, concordance show, gloss example (the Figure 6). The Lacunae Model is in the mind map Language-Speech-Cognition as LACUNA-GAP-ABSENCE. The second stage of presenting the core of the Lacunicon register is the lexicographical data resulting in syncet of lacunae as shell words as H. Schmidt proposed to name abstract words of lacunicon. The third stage of the corpus-based Lacunicon is verification of the basic terms in ISO databases [10], COCA [2], BNC [1]. The fourth stage of Lacunicon Syncet analysis as the terms of the corpus-based lacunicon register showed the Semantic similarity of Lacunicon Syncet and the basic terms of the lacunicon register are illustrated by frequency rate. Figure 6: The Lacunicon Syncet As the Figure 7 shows, the Latex (LaTex system for document preparation with Tex distribution) template has enabled us to prepare the preformatted document relating to the Lacunicon project comprising lacunar terms built on the existing schema, provided by the Latex online Overleaf project. Figure 7: The Latex Dictionary 5. The Academic implications This study is the academic research, which has a perspective plan and can be implemented in the further studies of translation studies (e-dictionaries), corpus-based analysis (applied linguistic studies), and finding better ways to format and show off your dictionaries online, also we seek for better ways to eliminate lacunae (as unknown words) in texts, corpora for making better taxonomies, dictionaries and open internet libraries for linguists and scholars willing to participate in the project Lacunae Gloss and Lacuna Syncet. The further analysis will be directed towards creating Lacunicon Glossary using F. Mittelbach [7] approach to TEX procedures (also, using Latex templates); upbuilding the Lacunicon Syncet register; creating a database of Lacunicon Syncet library. 6. Discussion section Modern translators and applied linguists are to be ready to work with different linguistic data and prepare electronic dictionaries to use and share. In order to give insights into the grounds of electronic lexicography we were aiming at having given the outline of new possibilities and modern tools. Thus, we are working on with electronic libraries having given a smooth start to test e-dictionaries from Lexonomy family and Latex templates in order to build and properly format dictionaries using existing templates and accessible libraries. In term of their professional work as translators, the ability to create the personal dictionary will help to use corpus possibilities in creating their personal libraries of translation memory files and specialized dictionaries compatible with translation systems such as Wordfast and Trados. 7. Conclusions section The linguistic profession tends to be a bit updated in order to understand that coding can be helping in creating personal templates and libraries. We are concluding that electronic lexicography is our future. The ability to try and create a e-dictionary is a needful skill to add to resumes of professional translators and applied linguists. In our study we have showed how easy steps will lead to building the personal portfolio that must be electronic, comparative and beautifully formatted. As for the perspective, our students, the applied linguists will be able to demonstrate the success in the integration of linguistic data with help of various different tools and formal languages. Also, in the future we do hope they will learn how Latex environment cooperates with Python. We do hope to create a system of compatible format including tex, tmx, txt, pdf and other formats for cloud instruments to satisfy the needs of would be professionals. 8. Acknowledgements We would like to thank my co-workers and colleagues for their constructive comments and suggestions. The research was partly funded by Sumy State University and National Dragomanov Pedagogical University. 9. References [1] British National Corpus. URL: http: //www.mova.info/corpus.aspx [2] Corpus of Contemporary American English. URL: http: //www.corpus.byu.edu/coca/ [3] LaTeX Templates. URL: https://www.latextemplates.com/ [4] M. Měchura. Gentle introduction to Lexonomy, 2021. URL: http: //https://www.lexonomy.eu/docs/intro [5] Maks, E. OMBI: the practice of reversing dictionaries. In International Journal of Lexicography, 20(3). 2007: 259-274. URL: http://doi: 10.1093/ijl/ecm028 [6] Měchura, Michel. B.‘Introducing Lexonomy: an open-source dictionary writing and publishing system’ in Electronic Lexicography in the 21st Century: Lexicography from Scratch. Proceedings of the eLex 2017 conference, 19-21 September 2017, Leiden, The Netherlands: 2017: 662-679. [7] Mittelbach, Frank, Carlisle, David, Rowley, Chris. The LaTEX3 Programming Language – a proposed system for TEX macro programming. TUGboat, Volume 18 . No. 4. 1997: 303-308. [8] Meelen, Marieke, Roux, Élie and Hill, Nathan. 2021. Optimisation of the Largest Annotated Tibetan Corpus Combining Rule-based, Memory-based, and Deep-learning Methods. ACM Trans. Asian Low- Resour. Lang. Inf.Process. 20, 1, Article 7 (March 2021), 11 pages. URL: https://doi.org/10.1145/3409488 [9] Soundex algorithm by Roget’ Thesarus. URL: http://www.roget.org/mixed.htm [10] The ISO Concept Database URL: ISO. https://www.iso.org/obp/ui/ [11] The Ukrainian language corpus. URL: http: //www.natcorp.ox.ac.uk [12] Viks, Ü. Eesti-X-keele sõnastik ja grammatika. [Estonian-X dictionary and grammar.] In Eesti Rakenduslingvistika aastaraamat, 4, 247−261. Wang, Xin, Tapani Ahonen, and Jari Nurmi. "Applying CDMA technique to network-on-chip." IEEE transactions on very large scale integration (VLSI) systems 15.10 2007: 1091-1100. [13] Villalva, Alina, Williams, Geoffrey. LandLex: the description of lexical variation through in-depth case studies, in: The landscape of lexicography. Alina Villalva & Geoffrey Williams (eds.) Centro de Linguística da Universidade de Lisboa Centro de Línguas, Literaturas e Culturas da Universidade de Aveiro: 2019:15-25. [14] WordNet 3.1 Online version. URL: http://wordnetweb.princeton.edu/perl