Terminology Dictionary Digitalization Volodymyr Shyrokov 1, Iryna Ostapova1, Yevhen Kupriianov2, Alona Dorozhynska1, Mykyta Yablochkov1 and Iuliia Verbynenko1 1 Ukrainian Lingua-Information Fund of NAS of Ukraine,3, Holosiivskyi avenue, Kyiv, 03039, Ukraine 2 National Technical University “Kharkiv Polytechnic Institute”, Kyrpychova str. 2, Kharkiv, 61002, Ukraine Abstract One of the most important tasks of Ukrainian lexicography is the elaboration of technology for conversion of the whole dictionary heritage into digital format. Many national dictionaries that have been traditionally published in paper, are being digitalized now due to the current world trends. This purpose requires elaboration of adequate technological solution. In this context it should be noted that there have been elaborated various approaches to dictionary digitalization. However, a general solution to be applicable for dictionaries of different types hasn’t been found yet. Therefore, the purpose for our research is to build up and propose digitalization technology which would be common and usable for different dictionaries. The Dictionary of Ukrainian Biological Terminology, which has a rather large volume and complex structure, was chosen for digitalization. Our technology proposed represents the step-by-step conversion of the dictionary from paper text to web site version. The basic steps are as follows: 1) text in PDF-format, 2) HTML-file, 3) formal model of the dictionary referred to as lexicographic system, 4) XML-file, 5) database, 6) website. The first step was converting the PDF to a simple HTML file that contains only visual markup. The next, but main stage, was developing the model of the dictionary lexicographic system to serve as a basis for the XML-structure of the dictionary entry. The further digitalization was based on the XML file. The dictionary text was marked up with XML tags using special software. At the next steps the database and website were elaborated. With the website interface the user has not only the access for updating and revision of the dictionary text but the every-time technical support. Keywords 1 Computer lexicography, lexicographic system, parsing, XML, database, digital space, website. 1. Introduction One of the tasks resolved by modern computer lexicography is creating digital dictionaries, in particular multilingual terminology dictionaries. Most of them don’t have digital versions, so the urgent task is their digitalization. Many tools which have been created today are applicable only for individual stages of terminology dictionary making process however there is no universal technological solutions to the basic problems of digital terminography. This is especially true of the digital reception of traditional terminological heritage, especially multilingual [5,6]. Among all the dictionary diversity, the Dictionary of Ukrainian Biological Terminology was chosen for digitization [1] (according to the authors, this dictionary is the first lexicographical work of the new generation in Ukrainian studies, covering the most common biological terminology in Ukrainian, Russian and English and offering term definitions). The proposed terminology dictionary covers the normative general scientific and widely COLINS-2022: 6th International Conference on Computational Linguistics and Intelligent Systems, May 12–13, 2022, Gliwice, Poland EMAIL: vshirokov48@gmail.com (V. Shyrokov); irinaostapova@gmail.com (I. Ostapova); eugeniokuprianov@gmail.com (Y. Kupriianov); alonochkatkachyk@gmail.com (A. Dorozhynska); gezartos@gmail.com (M. Yablochkov); yulia_verbinenko@yahoo.co.uk (I. Verbynenko) ORCID: 0000-0001-5563-8907 (V. Shyrokov); 0000-0001-8221-3277 (I. Ostapova); 0000-0002-0801-1789 (Y. Kupriianov); 0000-0001- 6554-6731 (A. Dorozhynska); 0000-0003-1175-1603 (M. Yablochkov); 0000-0002-7111-0755 (I. Verbynenko) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) used terminology of biological sciences, fixed in modern encyclopedic, general and special dictionaries, as well as in scientific, popular science, educational and informative literature. Our approach offers stage-by-stage conversion of the dictionary text into website format. The main stages are as follows: Paper dictionary => Lexicographic system (L-system) of the dictionary => XML- tagging of the dictionary text following the L-system structure => Converting XML-tagged text of the dictionary into database format => Web site version of the dictionary on ULP (Ukrainian linguistic portal). This technological procedure, in our opinion, contains the steps which are possible to be applied to other dictionaries, so we believe that this sequence is an effective and universal way to transform paper dictionaries into digital format. 2. Method 2.1. Term dictionary conceptual model The digital transformation of lexicographic works requires some general theoretical framework to describe and represent the widest possible range of the objects in lexicography. Our developments are based on the theory of lexicographic systems. The dictionary is considered as an information system of a special type, namely as a lexicographic one. This is an abstract language-information object focused on the implementation of a comprehensive description of the lexical and grammatical structures of the language or a set of languages [3, 4]. The system architecture corresponds to the standard three-level architecture of information systems ANSI/X3/SPARK, according to which the information system is divided into conceptual, internal and external data levels [3]. The internal level defines the types, structures and formats in which data are to be represented, stored and manipulated. The external level ensures a set of procedures which allow the user to manipulate the data represented at the internal level. The conceptual level of representation (conceptual model) is a symbolic, semantic model which integrates the various specialists’ views about the domain in an unambiguous, final and inconsistent way. As a conceptual model we have chosen the lexicographic data model [3] which is represented in a simplified form: {𝐷, 𝐼𝑄0 (𝐷), 𝑉( 𝐼𝑄0 (𝐷)), , [], 𝑅𝑒𝑑[𝑉( 𝐼𝑄0 (𝐷)]}, (1) where D is the modeling object (domain), in our case the Dictionary of Ukrainian Biological Terminology; I0(D) = {xi} is the set of the language units described in the dictionary (in the theory of lexicographic systems it is usually referred to as the set of elementary information units); V(I0(D)) is a set of descriptions (interpretations) of elementary information units (in case of dictionaries the set V(I0(D)) = {V(xi)} corresponds to the set of dictionary entries dedicated to words xi);  indicates a set of structural elements to be revealed in the process of the dictionary text analysis; [] designates the structure to be generated within  by the operator  and represents the system of relationships reflecting the semantics of the domain considered; the restriction of [] by V(x) gives the microstructure (x) of the dictionary entry V(x); Red[V(I0(D)] is the mechanism of recursive reduction which reveals the finest structures of the lexicographic system. The structures  and [] specify the semantics of linguistic facts and regularities composing the lexicographic system (L-system). In this case  is a set of the simplest information elements of the dictionary (words, abbreviations, entry notes, numbers, elements of grammar and vocabulary description etc.). The structures  are given explicitly in the dictionary and defined in the following way: a set of (х) forming the entry V(х) is assigned to each хIQ(D) so that: 1. х  (х) 2. Any fragment of the dictionary entry V(х) can be built from the elements belonging to (х) 3. The principle of identifying and defining of each (х) is to be common for all V(х) with headwords хIQ(D) 2.2. Dictionary entry structure The conceptual model has been built taken into account the paper version of the dictionary in question. That is, the typographic design, layout and structure of printed texts of dictionary articles are analyzed, which are interpreted as identifiers of the corresponding elements of lexicographic structures  and []. The following elements compose the dictionary entry structure: • CС: dictionary entry (represented by the paragraph in the text) • ЗТ_У: head term in Ukrainian (НО means homonym number as an attribute of the head term) • ТБі: term block (a text line composed of Ukrainian, Russian and English terms as well as their parameters) • СМБі: explanatory block (entire dictionary entry text without ТБ), the number of explanatory blocks corresponding to that of term blocks To clarify the dictionary entry structure, we have divided each term block into various sub-blocks which are separated from each other by the language marker: • ПТБ_У: Ukrainian sub-block (includes the whole text of the dictionary entry in Ukrainian together with all the parameters) • ПТБ_Р: Russian sub-block (includes the whole text of the dictionary entry in Russian together with all the parameters) • ПТБ_А: English sub-block (includes the whole text of the dictionary entry in English together with all the parameters) In its turn a sub-block may comprise several complexes dedicated to Ukrainian, Russian and English term: • ТК_У1-n: complex for Ukrainian term (each group includes one term and all its parameters in Ukrainian) • ТК_Р1-n: complex for Russian term (each group includes one term and all its parameters in Russian) • ТК_А1-n: complex for English term (each group includes one term and all its parameters in Russian) The term complexes are comprised by: language marker (ММ), grammar note before the term (ГРД), explanatory note (СР), term (T) and grammar note after the term (ГРП). There can be one term in one complex. The language marker has been introduced in the complex for Ukrainian term for generalization. The structure of the complexes for Ukrainian, Russian and English terms is represented below: ТК_У1-n Complex for Ukrainian term ММ language marker (укр.) introduced for generalization ГРД grammar note before the term (all grammar parameters placed before Ukrainian term) Т_У Ukrainian term ГРП Grammar note after the term (all grammar parameters placed after Ukrainian term) ТК_Р1-n Complex for Russian term ММ marker for the Russian language (identified as рос. in the text) СР explanatory note (all explanatory parameters) ГРД Grammar note to the term Т_Р Russian term ГРП Grammar note after the term ТК_А1-n Complex for English term ММ language marker (identified as англ. in the text) СР explanatory note ГРД Grammar note to the term Т_А English term ГРП Grammar note after the term The explanatory block (СМБ) is constituted by the definition blocks (БТm). There are as many explanatory blocks as there are definitions in the dictionary entry. Each explanatory block consists of: • НТ: definition number • ТЛ: definition • CPТ: explanatory note to ТЛ • ПБТС: collocation sub-block including all the collocations composed by the term • БП: reference block • БСН_ЗТ: block of synonyms to the head term The collocation sub-block is made up the blocks of term collocations (there may be several blocks). The term block of collocations consists of a term block of a collocation (ТБСЛ), explanatory block of collocations (БТСЛ) and block of synonyms to collocations (БСН_СЛ). Similarly, we introduce sub- blocks of the term block of collocations. The sub-blocks are introduced for the Ukrainian, Russian and English languages: ПТБСЛ_У, ПТБСЛ_Р, ПТБСЛ_А. Each of them may include several complexes for term collocations. The complexes are formed by the language marker (ММ), grammar note before the term collocation (ГРСД), term collocation (ТС) and grammar note after the term collocation. Explanatory notes before the term collocation haven’t been revealed. There can be only one term collocation in a complex. ПТБСЛ_У term collocation sub-block for Ukrainian ПТБСЛ_Р term collocation sub-block for Russian ПТБСЛ_А term collocation sub-block for ТСК_У1-p sub-block of the term block for Ukrainian ТСК_Р1-p sub-block of the term block for Russian ТСК_А1-p sub-block of the term block for English ММ language marker ГРСД grammar note before the term collocation ГРСП grammar note after the term collocation ТС_У term collocation in Ukrainian ТС_Р term collocation in Russian ТС_А term collocation in English Let us consider the explanatory note for term collocations (БТСЛ) consisting of the definition number of the term collocation (НТСЛ) and definition itself (ТЛС). The synonym blocks to the head term (БСН_ЗТ) and to the term collocation (БСН_СЛ) consist of the synonym marker (МС) and synonym row (СН1…СНn): • МС: synonym marker (“Син.”) • СН1…СНn: synonym row (to be included both by terms and term collocations) The reference block (БП) consists of the sub-blocks (ПБП). Each one includes the whole array of the references. The sub-blocks are subdivided into the number of references. The number of sub-blocks corresponds to that of reference markers (МП). The reference marker is the note “див.” (see). The sub- blocks are also subdivided into the addressee (САНТ) and recipients (САТj). The general XML scheme to mark-up the dictionary is as follows: Entry <ЗТ_У homonym number=і>Ukrainian head term <ТБ number=і> Term block <ТК_У number=і> Ukr. term complex <Т_У> Ukrainian term <ГРД number =і> Grammar note to <ГРП number =і> Grammar note to <ММ> укр. <ТК_Р number = і> Rus. term complex <Т_Р> Russian term <СР> Explanatory note <ГРД number =і> Grammar note to <ГРП number =і> Grammar note to <ММ> рос. <ТК_А number =і> Engl. term complex <Т_А> English term <СР> Explanatory note <ГРД number=і> Grammar note to <ГРП number=і> Grammar note to <ММ> англ. <СМБ number =і > <БТ number =і> Explanatory block <ТЛ> Definition <НТ> Definition <СРT> Explanatory note <СИН_ЗТ number=і> Synonym block <Т_У> term <ТС_У> term <МС> Син. <БТС number=і> Term collocations block <ТБCЛ number =і> Term collocation block <ТКС_У number =і> Ukrainian term collocation complex <ТС_У> Term collocation <ГРС> Grammar note <ММ> Language marker <ТКС_Р number =і> Russian term collocation complex <ТС_Р> Term collocation <ГРС> Grammar note <ММ> Language marker <ТКС_А number =і> English term collocation complex <ТС_A> Term collocation <ГРС> Grammar note <ММ> Language marker <БТСЛ number =і> Term collocation explanatory block <ТЛC> Definition to collocation <НТЛC> Number of definition collocation <СИН_СЛ number=і> Synonym block <Т_У> term <ТС_У> term <МС> Син. <БП number = і> Reference block <ПБП number = і> Reference sub-block <САНТ> addressee <САТ number=і> recipient <МП> reference marker <МП> 2.3. Example of marking the dictionary entry with XML tag The example below shows the printed version of the entry arrangement which corresponds to the developed entry structure in XML format. двохo#дкові, -их, ім., мн. (рос. двухo#дковые, англ. ringed lizards, worm lizard) 1. Червоподібні плазуни, тіло яких укрите суцільною роговою плівкою, поділеною на квадрати поздовжніми і поперечними борозенками. Entry elements: ТБ [term block]: двохo#дкові, -их, ім., мн. (рос. двухo#дковые, англ. ringed lizards, worm lizard) ПТБ_У [Ukrainian sub-block]: двохo#дкові, -их, ім., мн. ТК_У [Complex for Ukrainian term]: двохo#дкові, -их, ім., мн. ЗТ [Head term]: двохo#дкові ГРП [Grammar note after the term]: -их, ім., мн. ПТБ_Р [Russian sub-block]: рос. двухo#дковые ТК_Р [Complex for Russian term]: рос. двухo#дковые ММ[Language marker]: рос. Т_Р [Russian term]: двухo#дковые ПТБ_А [English sub-block]: англ. ringed lizards, worm lizard ТК_А1 [Complex for English term]: англ. ringed lizards ТК_А2 [Complex for English term]: worm lizard ММ [Language marker]: англ. Т_А1 [English term]: ringed lizards Т_А2 [English term]: worm lizard СМБ [Explanatory block]: 1. Червоподібні плазуни, тіло яких укрите суцільною роговою плівкою, поділеною на квадрати поздовжніми і поперечними борозенками. БТ [Definition block]: 1. Червоподібні плазуни, тіло яких укрите суцільною роговою плівкою, поділеною на квадрати поздовжніми і поперечними борозенками. НТ [Definition number]: 1 ТЛ [Definition]: Червоподібні плазуни, тіло яких укрите суцільною роговою плівкою, поділеною на квадрати поздовжніми і поперечними борозенками. The XML text reflecting the entry structure of the term dictionary in question is as follows: <СС> <текст_СС>двохo#дкові, -их, ім., мн. (рос. двухo#дковые, англ. ringed lizards, worm lizard) червоподібні плазуни, тіло яких укрите суцільною роговою плівкою, поділеною на квадрати поздовжніми і поперечними борозенками.]]> <ЗТ homonymy number='0'>двохo#дкові <ТБ number="1"> <ТК_У number="1"> <Т_У>двохo#дкові <ГРП>-их, ім., мн. <ММ>укр. <ТК_Р number="1"> <Т_Р>двухo#дковые <ММ>рос. <ТК_А number="1"> <Т_А>ringed lizards <ММ>англ. <ТК_А number="2"> <Т_А>worm lizard <ММ>англ. <СМБ number="1"> <БТ number="1"> <НТ>1 <ТЛ>червоподібні плазуни, тіло яких укрите суцільною роговою плівкою, поділеною на квадрати поздовжніми і поперечними борозенками 3. Experiment 3.1. Dictionary text representation in lexicographic database structure The programming language and technological platform for development were chosen, respectively, the C# language and .Net 5 platform. Due to the object structure of dictionary entry representation, there has been used a documentary-type database that meets the following requirements: • Ease of use • Possibility of supporting transaction mechanisms • Possibility of parallel access to database • Free of charge for research purposes. As a result, the choice was made for LiteDB (https://www.litedb.org/), a database of documentary type, created as a relatively simple, free copy of the shareware database MongoDB. An additional advantage of this database is the ease of installation and connection of the software package, as LiteDB is implemented as a single library file (dll) and a single configuration file (xml), rather than the entire software package. This database is informally called an analogue of MySQL for documentary databases. A parsing library Html Agility Pack (https://html–agility–pack.net) was used to process the obtained XML files in the software environment. For developing the structure of the repository class, two opposing approaches were considered: 1) creating a “family” of classes, where each class was a separate structural element, and the relationship between them is a reference to instances of the respective classes; 2) using nested classes, where the whole hierarchy of structural elements is part of the main parent class. Within goals set, bringing all the structural elements into separate independent collections in the database (which is a direct consequence of the first approach) is too redundant, and the implementation of access to them unreasonably increases the complexity of program logic. The second approach was further modified by converting structural elements from nested classes to nested structures to optimize the continued use of dictionary entry classes by the application. Each dictionary article is presented in the internal model of the application by the class of 1st type: • Class “DictionaryStorageClass”: container for the dictionary entry decomposed in various structural elements. So, the application uses a documentary database, the data is stored identically to their representation in the internal model. To ensure the coherence and efficiency of the development process as well as the use of classes-repositories and classes of the of dictionary entries index (described herein after), there have been identified several types of constants: • Language list: enumerator Languages, values Ukrainian, Russian, English. • List of term structure characteristics: enumerator TerminologyStructures, values Word, Collocation. • List of term types: enumerator TerminologyTypes, values MainTerm, SecondaryTerm, LinkedTerm, Synonym. • List of language markers: array of text variables LanguageMarks, values “укр.”, “рос.”, “англ.”. The class of “DictionaryStorageClass” type has the following structure in the lexicographic database: • Dictionary entry identifier: integer variable ID. • Head term: text variable OriginalDicEntryString. • Homonymy indicator of head term: integer variable Omonim. • Original text of the dictionary entry in text line format: text variable OriginalDic EntryString. • List of term blocks in the dictionary entry: list of elements TerminologyBlock – TermsList. • List of explanatory blocks in the dictionary entry: SemanticBlock – Semantic BlocksList. • Entry text HTML format, generated on the basis of the class: text variable Dic EntryHTMLString. • Entry text generated on the basis of the class: text variable DicEntryNoTags String. The element of “structTerminologyBlock” type (Term block) is represented by the following variables: • Identifier for implicit connection of the term block with explanatory block: integer variable LinkingID • List of term complexes in the given block: elements list of TerminologyComplex type TerminologyComplicesList The element of “structTerminologyComplex” type (Term complex) is represented by the following variables: • Term: text variable Term • Notes followed by the term (explanatory notes): list of text variables Semantic RemarksList • Notes before the term (grammar notes): list of text variables GrammaticRemarks LeadingList • Notes after the term (grammar notes): list of text variables GrammaticRemarksFollowing List • Sequence number for visualization: integer variable SequenceNumber • Language marker for visualization: text variable LanguageMark • Language indicator: variable Language of Languages type • Term structure indicator: variable TerminologyStructure of Terminology Structures types • Term type indicator: variable TerminologyType of TerminologyTypes type The element of “structSemanticBlock” type (Explanatory block) is described by the following variables: • Identifier for implicit connection of explanatory block with term block: integer variable LinkingID. • List of definition blocks of the given explanatory block: elements list InterpretationsList of InterpretationBlock type. The element of “struct InterpretationBlock” (Definition block) is represented by the following variables: • Term definition: term variable Interpretation • Identifier for implicit connection of collocation definition block with term collocation block, or sequence number for visualization of definition of the terms of “word” type: integer variable LinkingID • Notes after definition (explanatory notes): list of text variables SemanticRemarks List • List of synonyms: list of text variables SynonymsList • List of references to the definitions in other entries list of variables LinksList of InterpretationLink type • List of collocations CollocationsList of CollocationBlock type The element of “structInterpretationLink” (reference element) is represented by the following variables: • Reference term: text variable LinkingTerm • Head term in the reference entry: text variable LinkedDicArticleTerm • Homonymy index of head term: integer value LinkedDicArticleTermOmonim • Identifier of reference entry: integer variable LinkedDicArticleID • Analogue of the term referred to in the entry – text variable ReferenceTerm • Text marker of reference element: text variable LinkTypeMarker The element of “structCollocationBlock” type (Collocation block) is represented by the following variables: • Sequence number for visualization: integer variable SequenceNumber • List of term complexes in the given block: elements list CollocationsTermsList of TerminologyComplex type • List of definition blocks in the given collocation block: elements list Collocation InterpretationsList of InterpretationBlock type A complete diagram of the relationships of the class-repository and its nested structural elements is shown in Figure 1. Figure 1: Diagram of the relationships of the class-repository and its nested structural elements The process of parsing the dictionary entries does not contain any technological features. Owing to the direct relationship between the elements of the XML-structure and the elements of the class- repository, parsing is reduced to “passing” through this structure and filling the corresponding elements of the class. The only step in parsing, which isn’t completely trivial, was the processing the created link elements (InterpretationLink). It was performed after processing the input XML file, creating and writing to the database all classes-repositories of the entries to make possible the search for dictionary entries by head term of the referenced entry (elements LinkedDicArticleTerm and LinkedDicArticleTerm Omonim), and by recording entry ID in the reference element (LinkedDic ArticleID). The examples of class-repository DictionaryStorageClass are given below. Figure 2: Class-repository DictionaryStorageClass for the entry “двохoдкові” An integral part of modern digital dictionaries is the extensive indexing of dictionary entry elements. When developing the index class of the dictionary entry, as in the case of the repository class, the development of the dictionary in two stages and laying the groundwork for expanding the list of head words was taken into account. Based on this, the following decisions were made: • All elements of the index must be functionally equal • Indexes of word terms bear the information about the relevant explanatory block and the entry in general • Indexes of term collocations carry the information about the relevant collocation block and the entry in general • If it is impossible to fill in the information on the relevant structural element of the article to which the index refers, appropriate mark is made Each index element is represented by the application inner model by the class of the first type: • Class of “DictionaryIndexClass” type: index element container The class of “DictionaryIndexClass” type in lexicographic database has the following structure: • Inner identifier for DB: integer variable • Indexed term: text variable Term • Homonymy index for indexed term: integer variable Omonim • Entry identifier: integer variable DicArtID • Language indicator: variable Language of Languages type • Term structure indicator: variable TerminologyStructure of Terminology Structures type • Term type indicator – variable TerminologyType of TerminologyTypes type • Availability indicator of grammar notes for the term: Boolean variable HasTermGram Remarks • Availability indicator of explanatory notes for the term: Boolean variable HasTermSem Remarks • Availability indicator of explanatory notes for at least one definition of the term: Boolean variable HasInterpSemRemarks • Indicator of filling in the information on the relevant structural element of the entry – Boolean variable InfoFilled • Number of term definitions: integer variable InterprNum • Number of term collocations: integer variable CollocNum • Number of synonyms: integer variable SynonymsNum • Number of references in term definitions: integer variable LinksNum • Number of term complexes in the term block: integer variable TermComplices Num • Number of explanatory blocks in the entry: integer variable ArticleSemBlocks Num An instance of this class is created for the following elements of the dictionary entry: • For each term complexes (words and collocations) • For all synonyms from definitions and collocations • For all references from definitions and collocations For synonyms, the index contains information about the corresponding explanatory block. For references, the index can contain two versions of information: • If ReferenceTerm was found among the created indexes, the information is duplicated from it • If the element is not found, the information is taken from the index element of the head term of the article The example in figure 3 below shows the class of the entry index elements DictionaryIndexClass. Figure 3: Entry index elements for the head term worm lizard The term index “worm lizard” of the entry «двохo#дкові» is as follows: Structure – Word; Type – Secondary term; Language – English; Availability of grammar notes to the term – no; Availability of explanatory notes to the term – no; Availability of explanatory notes in the entry – no; Number of definitions – 1; Number of collocations – 0; Number of synonyms – 0; Number of references – 0; Number of term complexes in the block – 4, Number of explanatory blocks – 1. The interface (external model) of the dictionary incorporates the developments and experience gained in developing the toolkit for researching the Spanish dictionary (Diccionario de la lengua Española 23 ed.) [2], Ukrainian-Polish Lexicon of Active Phraseology and application for visualization of Etymological Dictionary of Ukrainian Language (EDUL) with functions of superficial analysis of the entries and their comparison with the printed version of EDUL [5]. The HTML code for entry visualization is created dynamically for all entries during the application launch and is stored in a temporary collection in the database. Using the capabilities provided by HTML 5 makes possible to enter a large amount of information into the HTML-code of the entries both for visual presentation of articles and to provide interactive functionality (currently – the transition to active parcel elements). The example of the entry “двохo#дкові” in HTML format and for user’s view is given below. HTML code:

двохo#дкові

1.

укр. двохo#дкові, –их, ім, мн, рос. двухo#дковые , англ. ringed lizards , worm lizard

червоподібні плазуни, тіло яких укрите суцільною роговою плівкою, поділеною на квадрати поздовжніми і поперечними борозенками;

User’s view: 4. Results Based on the L-system model and lexicographic database structure the following requirements were set for the dictionary interface: • Displaying the linear text of the dictionary articles with color highlight of specific structural elements of the entries • Providing access to all elements of the dictionary wordlist with the ability to use them while searching in the dictionary • Making possible to make samples conforming the parameters available in the index class (signs of dialectics, onomastics and homonymy) • Providing the possibility of conducting a full-text search on the content of the dictionary entry • Ensuring the possibility of navigating by the links from one entry to another with recording the navigation history For dictionary interface development it was decided to use .Net Core technologies to ensure multi- platform application, and WebAPI to ensure data exchange, namely the processing of queries between the client and server parts of the web application. Since the task was to visualize dictionary entries, not to edit them, the interface was developed in this regard – visualization of the entry, variations of search in the word list and making samples of dictionary entries on the available parameters. For easy creation and further development of the interface elements, a set of HTML, CSS and JavaScript scripts in the Bootstrap language was used, which ensures quick creation and deployment of necessary interface elements. The main interface elements are the word list window and the window for displaying the dictionary entry. 5. Conclusions The described parsing scheme of Dictionary of Ukrainian Biological Terminology is actually universal and suitable to be used in creating digital versions of almost any three- (and multi-) lingual terminology dictionaries based on their PDF-texts. This versatility is achieved by combining the following factors: 1. Applying the theory of lexicographic systems, which is universal and adequately reflects the structure of dictionaries of any kind. The three-level architecture of the L-system in the form of ANSI / X3 / Spark provides ample opportunities for conceptual generalizations, software modifications, variations of interaction scenarios of different users with the system, etc. 2. Application of methodology and technology of converting digital PDF-text of the dictionary into lexicographic database using the sequence: dictionary text in PDF => dictionary text in Word format => HTML text => XML text. This approach allows the presentation of complex structured lexicographic information in the form of a well-formed XML document reflecting the hierarchy of the information contained in a typical dictionary entry. This is achieved through the implementation of an abstract lexicographic model that adapts the semantic properties of arbitrary special information. The conversion of XML text to the lexicographic database is performed automatically, which determines the high efficiency of this parsing method. The availability of dictionary text in XML is a real prerequisite for creating various applications, including virtual systems of professional interaction such as VLL (virtual lexicographic laboratory), modification of source dictionary material, its integration into other dictionaries, use as material for creating resident systems of professional information processing (editing, abstracting, automatic translation, conceptual design and knowledge engineering, etc. [7–12]). 6. References [1] D. M. Grodzinsky, L. O. Simonenko and other, Ukrainian biological terminology Dictionary, КММ, Kyiv, 2012. [2] Real Academia Española: Diccionario de la lengua española, 23.ª ed., [versión 23.5 en línea]. URL: https://dle.rae.es. [3] V. А. Shyrokov (Ed.), Linguistic and information studies: works of the Ukrainian Language and Information Fund NAS of Ukraine, volume 1: Scientific paradigm and basic language and information structures, Ukrainian Lingua-Information Fund of NAS of Ukraine, Kyiv, 2018. URL: https://movoznavstvo.org.ua/files/tom_1_B5_print.pdf. doi: 10.33190/978-966-02-8683-2/8684- 9. [4] V. А. Shyrokov (Ed), Linguistic and information studies: works of the Ukrainian Language and Information Fund NAS of Ukraine, volume 2: Grammar systems, Ukrainian Lingua-Information Fund of NAS of Ukraine, Kyiv, 2018. [5] I. Kernerman, A multilingual trilogy: Developing three multi-language lexicographic datasets, in: Proceedings of Electronic Lexicography in the 21st Century: Linking lexical data in the digital age, eLex2015, Herstmonceux Castle, United Kingdom, 2015, pp. 372–383p. URL: https://elex.link/elex2015/. [6] L. Trap-Jensen, Lexicography between NLP and linguistics: aspect of theory and practice. In: J. Čibej, V. Gorjanc, I. Kosem, S. Krek (eds.), Lexicography in Global Contexts, Proceedings of the 18th EURALEX International Congress, Ljubljana, 2018, pp. 25–38. [7] L. Giacomini, Frame-based Lexicography: Presenting Multiword Terms in a Technical E- dictionary, in: Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana University Press, Faculty of Arts, Ljubljana, 2018. URL: https://e- knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/118/211/3000-1.pdf [8] M. Czerepowicka, The structure of a dictionary entry and grammatical properties of multi-word units, in: Electronic lexicography in the 21st century. Proceedings of the eLex 2021 conference, Lexical Computing CZ, Brno, 2021. URL: https://elex.link/elex2021/wp-content/uploads/ eLex_2021-proceedings.pdf. [9] T. Mészáros, M. Kiss, The DHmine Dictionary Work-flow: Creating a knowledge-based author’s dictionary, in: Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana University Press, Faculty of Arts, Ljubljana, 2018. pp. 77–86. URL: https://e-knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/118/211/3000-1.pdf. [10] P. Storjohann, Commonly confused words in contrastive and dynamic dictionary entries, in: Proceedings of the XVIII EURALEX International Congress: Lexicography in Global Contexts, Ljubljana University Press, Faculty of Arts, Ljubljana, 2018, pp. 187–197. URL: https://e- knjige.ff.uni-lj.si/znanstvena-zalozba/catalog/view/118/211/3000-1.pdf. [11] E. Sassolini, A. F. Khan, M. Biffi, M. Monachini, S.Montemagni, Converting and Structuring a Digital Historical Dictionary of Italian: A Case Study, in: Electronic lexicography in the 21st century: smart lexicography. Proceedings of the eLex 2019 conference, Sintra, 2019, pp. 603–621. URL https://doi.org/10.5281/zenodo.3726847. [12] J. Norri, M. Junkkari, T. Poranen, Digitization of data for a historical medical dictionary, Lang Resources & Evaluation 54 (2020) 615–643. URL: https://doi.org/10.1007/s10579-019-09468-2.