The website «Minority languages of Russia»: visualizing language data Konstantin Polivanova, Olga Kazakevicha,b, Natalia Serdobolskayaa, Zaira Khalilovaa, Elena Budyanskayaa, Anastasia Evstigneevaa, c, Karina Mishchenkovaa, d, e, Daria Mordashovaa, c, Sofie Pokrovskayaa and Evgeniya Renkovskayaa, f a. Institute of Linguistics, Russian Academy of Sciences, 1 bld. 1, Bolshoy Kislovsky Lane, Moscow, 125009, Russian Federation b. Institute of Linguistics, Russian State University for the Humanities; GSP-3, 6, Miusskaya Pl., Moscow, 125993, Russian Federation c. Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow, 119991, Russian Federation d. School of Linguistics HSE University, 21/4, Staraya Basmannaya str., Mosсow, 105066, Russian Federation e. Ivannikov Institute for System Programming, 25, Aleksandra Solzhenitsyna str., Moscow, 109004, Russian Federation f. Institute of Oriental Studies, Russian Academy of Sciences, 12, Rozhdestvenka str., Moscow, 107031, Russian Federation Abstract The paper deals with the issue of language data visualization on the website "Minority languages of Russia". This site is created as an open information resource containing materials on the functioning and structure of minority languages of Russia and their local varieties. Methods of data visualization in five areas are considered: genealogy, areal distribution, domains of language usage, dynamics of language usage and phonetic data representation. For each area, theoretical problems, related tasks and technical implementation are indicated, and prospects for further work are discussed. Keywords Minority languages of Russia, visualization, genealogy, areal distribution, domains of language usage, dynamics of language usage, phonetic data representation, JavaScript, HTML, CSS. 1. Introduction The website "Minority languages of Russia" is created on the basis of Laboratory for Research and Preservation of Minority Languages, Institute of Linguistics, Russian Academy of Sciences, as an open information internet resource containing materials on the functioning and structure of minority languages of Russia and their local varieties. The pilot of the website is available at: http://minlang.iling- ran.ru. The main task of this resource is to systematize the available information on languages. This includes field data obtained from language experts, as well as various published materials. The website contains both modern data, which we intend to update regularly, and those related to the previous century. __________________________ Proceedings of the Linguistic Forum 2020: Language and Artificial Intelligence, November 12-14, 2020, Moscow, Russia EMAIL: polivanov.studio@gmail.com (Konstantin Polivanov); kazakevich.olga@gmail.com (Olga Kazakevich); serdobolskaya@gmail.com (Natalia Serdobolskaya); zaira.khalilova@gmail.com (Zaira Khalilova); budyanskaya.lena@gmail.com (Elena Budyanskaya); evstigap@gmail.com (Anastasia Evstigneeva); karinam6@mail.ru (Karina Mishchenkova); mordashova.d@yandex.ru (Daria Mordashova); sofie.v.pokrovskaya@gmail.com (Sofie Pokrovskaya); jennyrenk@gmail.com (Evgeniya Renkovskaya) ORCID: 0000-0003-4571-9476 (Konstantin Polivanov); 0000-0003-0597-2979 (Olga Kazakevich); 0000-0003-2417-5537 (Natalia Serdobolskaya); 0000-0003-1604-4510 (Zaira Khalilova); 0000-0002-6306-6280 (Elena Budyanskaya); 0000-0002-3413-2058 (Anastasia Evstigneeva); 0000-0001-9175-987X (Karina Mishchenkova); 0000-0002-8330-520X (Daria Mordashova); 0000-0002-4895-7069 (Sofie Pokrovskaya); 0000-0003-1944-0746 (Evgeniya Renkovskaya) ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Further on the paper is structured as follows: §2 provides an overview of related work, §3 describes data and methods - the arrangement of the language profile in terms of content and technical implementation. §4 is the central part of the paper, it is devoted to the results of our work on language data visualization in five areas of concern. §5 discusses the future prospects of the work, and §6 summarizes the results. 1 2. Related work In the development process we take into consideration the existing tradition of sociolinguistic descriptions of the world’s languages: in particular, the results of the long-term project “The written languages of the world”, created through joint efforts of International Centre for Research on Language Planning, Laval University (Canada, Quebec) and research groups from all over the world [1, 2, 3, 4, 5]. The two-volume edition “Written languages of the world: the Russian Federation” is also a part of this project [6, 7]. Furthermore, we take into account such large-scale monographs on the sociolinguistic situation in Russia as “Languages of the peoples of Russia: The Red Book” and “Language and Society” [8, 9]. However, the online format of our resource provides more opportunities than the format of a printed publication. For instance, the information can be regularly updated to remain relevant, and special types of data can be presented (see §3 for further details). Before we started working on the website, there had already been several resources dedicated to certain regions and languages. Let's take a closer look at two examples. The website “Minority languages of Siberia as our cultural heritage” 2 is a long-term research project on three languages: Ket, Selkup and Evenki [10]. We use the experience of this resource to a large extent as far as technical implementation concerns. In particular, content management framework Drupal was chosen as the site's system core, which is well suited for working with a large amount of complex data [11]. Besides, we follow the way of presenting certain types of information, such as texts. Another online project is the Atlas of Multilingualism in Dagestan3 [12]. It is concerned with a single area of sociolinguistics (namely, multilingualism) on the basis of a limited number of languages (namely, the languages of Dagestan, Russia). The website contains an up-to-date database on Dagestanian multilingualism, equipped with a search interface. The implementation of some information blocks is similar to our website: a map with an indication of the languages and the settlements where these languages are spoken, and visualization of census data. In general, the projects on multilingualism in Dagestan and the minority languages of Siberia can be characterized as narrowly focused, while our website is aimed at a more encyclopedic approach that enables the users to obtain a wide range of information on a large number of languages accommodated in the same place. In the future we intend to cover all the languages of Russia. 3. Methods and data The main section of the site is the webpage "Languages" that contains a list of the minority languages of Russia compiled by the Institute of Linguistics, Russian Academy of Sciences4. At the same time, the languages are not just sorted alphabetically, but are grouped in language families (including isolated and mixed languages) and listed alphabetically inside language families and/or groups. Regarding the language families’ structure, we relied on the latest elaborations (for more details see §4.1). There is an access to the detailed information on every language from this list. 1We are currently working on the Russian version of the site, an English version is planned for the future. In the paper we use illustrations translated into English for the convenience of the readers. 2This resource is available at http://siberian-lang.srcc.msu.ru. 3This resource is available at https://multidagestan.com . As of November 2020, it contained the information on multilingualism of the residents of 60 villages. 4The list of the languages of Russia is presented on the IL RAS website at https://iling-ran.ru/web/ru/jazykirf . This section was prepared by Yu. B. Koryakov in collaboration with T. B. Agranat, M. A. Goryacheva, A. V. Dybo, O. A. Kazakevich, A. A. Kibrik, O. V. Khanina and A. B. Shluinsky. One of the main sources of materials is field data collected during the recent field trips (including illustrative materials: photographs, current cartographic and sociolinguistic data, video and audio recordings of the texts with transcripts). To obtain these materials, we communicate directly with language researchers (both in Russia and abroad). The experts are invited to fill out a questionnaire developed by the Lab, that enables to unify the presentation of languages on the site. The website also contains the results of the Russian Federation and the Soviet Union population censuses (1926, 1989, 2002, 2010). In addition, we draw on early accounts of the functioning of languages, mostly compiled in the last century. It allows to track the language situation development for each particular language. As for the technical implementation, page templates are built in an optimal way, semantic elements (such as header, footer, article) are used in the construction in order to clearly define the content. Unnecessary wrapper elements are excluded, thus reducing nesting and simplifying the structure of the DOM tree (the DOM object model of the document). The names of identifiers and classes of markup elements are called by understandable words in order to facilitate the reading and machine parsing. A set of SEO optimization tasks is being performed. All this provides a fast and obvious path to the data, as well as efficient indexing by search engines. The user interface is developed by means of modern data structuring and markup tools - HTML5, as well as cascading CSS3 style sheets. We select the most convenient ways of presenting the information. JavaScript and jQuery libraries are used to represent interactive elements. In particular, the leaflet library is used (with additional extensions such as "Leaflet Markerclusterer" for clustering points on the map) for programming interactive maps [13]. For developing the interactive dynamic data visualizations we use the libraries JavaScript D3 and chart.js [14, 15]. Some visualizations are built with the help of "datawrapper" - a service based on JavaScript and HTML5 that can represent the data in the form of graphs and maps [16]. The structure of the language page and the features of its technical implementation are presented in Appendix 1. 4. Results An important task in the development of the language page was to avoid the presentation of data in the form of a continuous text. It was necessary to organize a comfortable navigation through the sections of the page, as well as to enliven the presentation of the data. An interactive table of contents and various rendering methods were used to make the page more aesthetically appealing and user-friendly. During the development of the language page, we faced a number of difficulties in terms of language data visualization. The paper examines five of them: the representation of genealogy, areal distribution, domains of language usage, dynamics of language usage and phonetic data. Each subsection is structured according to the following scheme: theoretical problems, related tasks of visual representation and the technical implementation of this representation. 4.1. Representation of genealogy Genealogical relationship is basic information that is usually given in the beginning of a language description in the form of a short note about a related language group and a family; it does not imply further enumeration of related languages belonging to the same group or other groups of the same family. Still, since our website targets a wide range of users, we find it crucial to provide a complete picture of the family tree for each language. As already mentioned in §2, we follow the theoretical paradigm developed in the Institute of Linguistics regarding the list of languages and genealogical relationships. For example, we consider that separate groups of the Altaic languages are genetically related and not just form a Sprachbund. Moreover, we keep track of the latest studies in this field and present a new division within the Uralic language family [20] on our site. It is important for us to achieve the following goals: easy usage; adequate data display; visualization of the full genealogical picture (from dialect to family) with a possibility to narrow it down to specific segments. As a solution, we’ve chosen two interactive schemes that differ in the way of data visualization. The first scheme is a common representation of a language family as a genetic language tree (see Figure 1). Most trees have a lot of branches, so we provided an opportunity to collapse tree nodes to easily focus on a specific segment of the scheme. The collapsed nodes are coloured grey. To make the diagram clearer we coloured the nodes of the languages, groups of dialects and dialects in contrasting colours that distinguish them from the higher level nodes. The values of the colours are given in the diagram legend. Figure 1: A fragment of the Nakh-Daghestanian genealogical tree. The arrow points at the collapsed node that can be unfolded. The visualization is based on the D3 tree diagram. The data presented in the tree are pre-packaged in a json file. The default structure of the json file from D3 tree did not meet our goals, so we needed to expand the set of properties and values. Our json file specifies the following properties for the elements: child / parent elements and colors indicating that the elements belong to a language, a group of dialects, or a dialect. A snippet of such a file is given in Figure 2. Figure 2: A snippet of the json-file Figure 2 demonstrates the code of the json-file. In line 9, the node with the name "nakh" (in technical terminology, in this case, the attribute "name" with the value "nakh") has the parent node "nakh- daghestanian" in line 6 and child nodes with the names "Ingush" and "Batsbi" in lines 21 and 22, respectively. The "status" attributes contain RGB HTML colour codes in their values. For example, "name":"Chechen","status":"#ff0000" means that the node "Chechen" is coloured red. The code for parsing the json file and displaying it in the diagram was rewritten. The second scheme represents a language family as circles packed in one another (see Figure 3). The nested circles correspond to the levels of genealogical classification, and the intensity of the colour depends on the nesting depth. The lowest levels are marked in white. Figure 3: The Nakh-Daghestanian language family Technically, this scheme is implemented by using the js-library D3 Zoomable Circle Packing. The data are packed in a json file with the appropriate structure. Then, js parses the json file and builds a diagram. The advantage of this scheme is that it allows to see the whole family in one picture, to get the idea of its size and complexity. Nevertheless, for the moment the choice was made in favour of the tree model, because the default characteristics of the second scheme do not meet some of our goals. In particular, it is not possible to see the names of all languages at once. And what is even more important we cannot mark dialects, groups of dialects and languages with different colours (colours in this scheme are assigned automatically according to the nesting depth of a circle), and a suitable solution for this problem has not been found yet. The modification of this diagram in accordance with our tasks is the subject of further work. 4.2. Representation of areal distribution An important part of language description is the representation of the areal distribution. There are two ways of displaying this area on the map: with polygons (we paint over the territory where the language is spoken) or with markers (we mark the settlements where the language is spoken). In the future, we intend to use both options: markers - for languages with a small number of local variants, polygons - for languages with a large number of local variants. Since we started with languages with a small number of local variants, at the moment the second option with some modifications is represented on the language page. We saw our task here in the following: visualization of distribution of the local variants in the settlements; easy usage; adequate, but at the same time compact display of the data. To achieve these goals we developed an interactive map where the settlements are marked according to the local variants of the language that are spoken there (see Figure 4). Each dialect gets its colour (see two blue markers at the top of the map for Bezhta Proper dialect in Figure 4). The values of the colours are given in the legend under the map. For settlements where more than one dialect is spoken a special multicoloured marker was introduced (see the marker on the right side of Figure 4). When a marker is clicked, a list of dialects appears in a pop-up window (see Figure 5). When zoomed out, several points on the map are combined into one cluster marked with a number that indicates the amount of settlements within the radius of this cluster (see the green marker in the center of Figure 4). This makes the presentation of the data more compact. Figure 4: A fragment of the territory where Bezhta is spoken Figure 5: A window with a list of dialects that pops up when you click on a special multicoloured marker of a settlement The data are represented on the map using the leaflet JavaScript library [13]. Leaflet is a library that has proved its capability in a large number of resources. It provides great opportunities for working with geodata. Leaflet enables one to operate with any type of data (markers, polygons, lines), to use any substrates (OpenStreetMap, Yandex, GMap, etc.), and to customize the appearance of the markers and polygons displayed on the map according to one's needs. Leaflet can be extended with plugins, for example, for clustering data or for attaching json files with a database. Unlike Mapbox (a paid platform for working with GIS), there are no traffic restrictions. The geographical coordinates of the settlements are pre-packed in an array with the following properties: settlement name, dialect, and dialect code. Then the array is output to the map with marker clustering. The array elements correspond to the following model: [{latitude Coordinate}, {longitude Coordinate}, {pop-up window content}, {dialect number (more than one dialect is indicated by the letter m)}]. 4.3. Representation of the domains of language usage An important component of the sociolinguistic description of a language is a list of domains in which it functions. Thus, the volume and nature of the language use in these domains allow one to judge the vitality of the language. A list of 16 core domains was compiled, starting with family communication and education and up to the language use on the Internet. When choosing a method for representing the domains, the following tasks were primarily solved: visualization of a complete picture of the domains within a single user screen; convenience and speed of obtaining the information. This has been achieved by drawing up an interactive table of sixteen cells (see Figure 6), corresponding to the domains of the language usage. For ease of viewing and perception, the table is structured as follows: the cells of the table contain the titles of the domains, and it is noted whether the language is used in this domain or not (see the “empty or filled circle” marker in the lower right corner of each cell in Figure 6). When clicking on a cell, a pop-up window opens with expanded information about the peculiarities of the language use in this domain. Figure 6: Domains of language usage in Bezhta 4.4. Representation of the dynamics of language usage For our internet resource dedicated to minority languages, it turns out to be critical to track the change in the size of the ethnic group and the number of speakers over time in the case of each particular language. This section of the language page currently accumulates the data from the population censuses of various years, and in the future, we hope to represent the data from local administrations and researchers' estimates. Thus, the main task was to visualize the general picture of dynamics, which allows tracking the change of various quantitative indicators in time. To solve this problem, an interactive diagram was developed. It displays the change in the size of the ethnic group, the number of group members who consider the ethnic language of the group to be their mother tongue, and the number of those who reported proficiency in this language (see Figure 7). A simple charting tool - datawrapper.de - was used [16]. Figure 7: Dynamics of the Bezhta language usage 4.5. Representation of phonetic data ‘Phonetic data’ usually refers to an array of systems, describing both acoustic/articulatory phenomena and phonology. A coordinate plane, which is determined by the first and the second formants, or a trapezoid vowel diagram are often used for representation of a sound system of vowels. However, for phonological vowel systems a table structure is more relevant, as it enables to describe several differential features apart from height and backness. As for consonant sound systems, they are traditionally provided in tables: it is possible to draw a line between different types of features without data loss in this case. Language description implies a phonology description and a presentation of its sound manifestations. Our goal is to combine the acoustic and the articulatory features with the phonological representation in order to make our description compact and easy to analyze without special work with audio data. We primarily aimed at a united representation of phonetics and phonology, alongside with user- friendliness, objectivity and descriptive representativeness. As a solution, we have constructed graphic representations for vocalism (see Figure 8) and consonantism (see Figure 9) that reflect a phonological system with its correct sound manifestations. Each sound is transcribed according to the IPA5, and the whole classification is synchronized with Russian terms. An optimal representation of the structure within universal consonant and vowel tables has been elaborated, in order to facilitate the comparison of languages. 5This resource is available at: https://www.internationalphoneticassociation.org/content/full-ipa-chart. Figure 8: Votic vowel system The representation of vocalism is based on filling out an initial matrix with height values in strings and backness values in columns. We provided about two or three slots for each traditional element of classification (height/backness), as the same phoneme in the classification may display different spectrum in various languages. This solution is crucial both for language comparison and for the reflection of phonological processes. In addition to height and backness, there are secondary articulations significant for phonology: vowel length, labialization, nasalization and pharyngealization. For each characteristic there exists an additional string or a column, which tentatively accounts for the articulation manner and its impact on the spectrum. Thus, pharyngealized vowels are placed in columns to the right of the unpharyngealized ones, and nasal vowels are placed in strings below the non-nasal ones. The length of the vowel is marked when it is relevant: long vowels take place closer to the cardinal locus than their short pairs. The recited secondary articulations are marked by diacritics, while labial vowels have their own symbols in the IPA. Traditionally labials are placed to the right of their illabial pairs. If the language’s vowel system lacks nasalization/pharyngealization/length, the corresponding table cells are removed from the initial matrix. Primary strings and columns (based on height and backness) are preserved, empty cells are visualized as an empty space. It is necessary to distinguish the phonologically similar vowels with different acoustic features in different languages. Then a trapezoid is superimposed on the table: phonemes with the same height value adjoin to its horizontal contours, while phonemes with the same backness value adjoin to its vertical contours, so that each line corresponds to phonologically meaningful height/backness. Figure 9: Forest Enets consonant system The universal table of consonants is defined by the place and manner of articulation as columns and strings respectively. Secondary articulations refer to the place of articulation. In particular, labialization or pharyngealization splits each column into additional subcolumns. Labialized consonants are placed to the left of the neutral column, while pharingalized ones are placed to the right of it. Palatalization is described as follows: subcolumns are usually added to the right of the initial column in case of labial and coronal consonants, while in case of dorsal consonants they are added to the left. Abruptives are put in substrings below each string for pulmonic consonants. The last splitting of columns is done according to the state of the vocal fold (4 subcolumns: voicelessness, aspiration, glottalization and voicing). This level of organization has colour marking. After the initial universal table is filled out, strings and columns with empty cells are removed from it. Therefore, we end up with a compact table describing the consonant sound and phonological system of a given language. 5. Discussion The visualization solutions we have adopted for the current version of the website can be further refined. We have already provided some development options for certain sections of the language page. For instance, the visualization of genealogy in the shape of nested circles can be proposed as an alternative representation, its advantages were formulated in §4.1. We intend to search for the ways of modifying this representation so that it could meet our requirements. In particular, it is essential to display the language names at several specified levels, as well as to indicate languages and dialects by means of different colours. For now, these colours are marked up in this scheme by default according to certain nesting characteristics. In §4.2, we have mentioned several options for representing the areal distribution of the language. At present, we implement the strategy of indicating the settlements with markers on the language page. In the future, we aim to represent the minor languages of Russia on a separate map located on the main page of the website. A polygonal representation enabling the users to sort and display the languages according to their choice seems to be the most suitable solution for this task. In this case, it will be necessary to provide a possibility of switching from the map to the pages of the corresponding languages. As regards the dynamics of language usage (see §4.4), there are also prospects for further work in terms of visualization. In particular, it is planned to expand the available data through the researchers’ estimations (for those languages that linguists work in the field with), as well as by using the information that can be received from local administrations. These data would significantly specify the information provided in censuses. In addition, we intend to take into account the results of the Russian Census 2021. As for the visualization of phonetic data (see §4.5), the accumulation of material on a larger number of languages will possibly allow us to further adjust the strategy for representing the systems of vocalism and consonantism. Moreover, it is planned to transform the phonetic tables into interactive ones: clicking on the phoneme symbol, the corresponding audio file will be played. 6. Conclusion In this paper we considered the methods of language data visualization on the website "Minority languages of Russia" in five areas of concern: genealogy, areal distribution, domains of language usage, dynamics of language usage and phonetic data representation. Visualization mechanisms were implemented by means of JavaScript tools, as well as available JavaScript libraries. However, the nature of the data dictated the need to modify the standard functionality of JavaScript libraries and the layout embedded in them, so that the representation in each case could adequately meet our tasks. The layout of visual representations is developed with the help of HTML and CSS, namely HTML5 and CSS3. More detailed guidelines on a visual representation of our resource can be found at: https://minlang.site/about#tech-materials. 7. References [1] H. Kloss , G. D. McConnell (Eds.), The written languages of the world: A survey of the degree and modes of use, volume 1: The Americas, Laval University Press, Quebec, 1978. [2] P. Padmanabha, B. P. Mahapatra, V. S. Verma, G. D. McConnell (Eds.), The written languages of the world: A survey of the degree and modes of use, volume II: India (2 books), Laval University Press, Quebec, 1989. [3] H. Kloss, A. Verdoodt, G. D. McConnell (Eds.), The written languages of the world: A survey of the degree and modes of use, volume III: Western Europe, Laval University Press, Quebec, 1989& [4] G. D. McConnell, Tan Ke Rang (Eds.), The written languages of the world: A survey of the degree and modes of use, volume IV: China (2 books), Laval University Press, Quebec, 1995. [5] G. D. McConnell, Les Langues Écrites du Monde: Afrique Occidentale, Les Presses de l'Université Laval, Québec, 1998. [6] G. D. McConnell, V. M. Solntsev, V. Yu. Mikhal'chenko (Eds.), Pis'mennye yazyki mira: Rossiĭskaya Federatsiya. Sotsiolingvisticheskaya entsiklopediya {The written languages of the world: Russian Federation. A sociolinguistic encyclopedia}, volume 1, Academia, Moscow, 2000. [7] V. Yu. Mikhal'chenko (Ed.), Pis'mennye yazyki mira: Yazyki Rossijskoj Federatsii.{The written languages of the world: Russian Federation.}, volume 2, Academia, Moscow, 2003. [8] V. P. Neroznak (Ed.), Yazyki narodov Rossii: Krasnaya kniga. Entsiklopedicheskij slovar'- spravochnik {Languages of the peoples of Russia: The Red Book. An encyclopedic dictionary.}, 2nd ed., Academia, Moscow, 2002. [9] V. Yu. Mikhal'chenko (Ed.), Yazyk i obshchestvo. Entsiklopediya {Language and society. An encyclopedia.}, «Аzbukovnik», Moscow, 2016. [10] O. A. Kazakevich, M. I. Vorontsova, E. L. Kliachko, K. K. Polivanov, Multi-Functional Web-Site “Minority Languages Of Siberia As Our Cultural Heritage”, in: Materials accepted for publication on the website of the 19th International Scientific Conference on Computational Linguistics "Dialogue 2013", Moscow, 2013 (electronic publication). URL: http://www.dialog- 21.ru/digests/dialog2013/materials/pdf/KazakevichOA.pdf. [11] Drupal: an open source content management software written in PHP and distributed under the GNU General Public License. URL: https://www.drupal.org/. [12] N. Dobrushina, D. Staferova, A. Belokon (Eds.), Atlas of Multilingualism in Dagestan Online, Linguistic Convergence Laboratory, HSE, 2017. URL: https://multidagestan.com, accessed on 2020-11-19. [13] Leaflet: an open source library written in JavaScript designed for displaying maps on websites. URL: https://leafletjs.com/. [14] D3: a JavaScript library for creating dynamic interactive data visualizations in web browsers. URL: https://d3js.org/. [15] Chart: an open source JavaScript library for data visualization that supports 8 chart types. URL: https://www.chartjs.org/. [16] Datawrapper: a tool for creating interactive maps and diagrams. URL: https://www.datawrapper.de/. [17] M. P. Lewis, G. Simons, Assessing Endangerment: Expanding Fishman’s GIDS, in: Revue Roumaine de Linguistique/Romanian Review of Linguistics, volume 2, 2010. [18] MediaElement: a powerful JS/HTML5 audio and video library that creates a unified view for media files. URL: https://www.mediaelementjs.com/. [19] OwlCarousel: a jQuery plugin, which allows to create sliders and carousels. URL: https://owlcarousel2.github.io/OwlCarousel2/. [20] J. Saarikivi, The divergence of Proto-Uralic and its offspring. A descendent reconstruction, in: M. Bakro-Nagy , J. Laakso, E. Skribnik (Eds), Oxford Guide to the Uralic Languages, Oxford University Press, forthcoming. Appendix 1 Language page structure Language page section Section content Technical implementation Basic information in tag The section contains 4 tags: Each tag occupies a separate format “Native speakers”, “Area”, cell in the database, which “Family” and “EGIDS Status” makes it possible to use the (for more details on the EGIDS tag as a filter or a criterion scale, see [17]). for sorting. At the layout level, the tags are packaged in a separate HTML block, which receives a graphic design different from the rest of the text using CSS styles. Brief information The section contains a brief The page contains only a few introductory text that provides lines of text and the "read information about areal more" button, which unfolds distribution, number of the rest of the text field below. speakers, dialect structure, In the unfolded state, the traditional lifestyle of the "collapse text" button appears people, etc. after the text. After clicking on it, the text collapses to its original position. This mechanism is built on JavaScript. The layout is performed using HTML/CSS. The graphical solution intuitively allows the user to understand how the mechanism works. Genealogy The section describes the See §4.1. genetic affiliation of the language and its dialect structure. The genealogy is visualized as an interactive diagram. Areal distribution The section provides the areal See §4.2. characteristics of the language, reflected on the interactive map. Language contacts and The section describes the The section is technically multilingualism languages that are (or implemented in the same way historically have been) in as the section “Brief contact with the language in information” (see above). question, as well as the multilingualism among its speakers. Language functioning The section provides the At the database level, each following information about section element occupies a the language: legal status, separate cell, which allows to writing system, language work with individual data standardization. The domains elements, as well as to apply of language usage are also them as filter and sort criteria. described here. This section is The section "Language one of the most important functioning" is divided into 4 sections of the site. subsections: "Legal status", "Writing system", "Language standardization", and "Domains of language usage". These subsections are represented by tabs, each of which contains a corresponding subsection. The tabs are designed in a separate HTML block, highlighted in color. Inside the block, a panel with the names of four subsections is displayed. These subsections serve as buttons for switching between the tabs. The button corresponding to an open tab is highlighted in the active color. The content of only one active tab is displayed under the button panel. This solution allows to present a large amount of information in a compact way, as well as to immediately provide the list of all the subsections. The mechanism is built by means of JavaScript and CSS. See §4.3 on the domains of language usage. Dynamics of language usage The section provides See §4.4. information on the changes in the language usage based on the census data (in the format of an interactive diagram) and on the degree of the language vitality at present. Language structure The section provides brief The section is presented as a information on the language grid of four blocks structure within 4 main levels corresponding to the (phonetics, morphology, subsections "Phonetics", syntax, vocabulary). This "Morphology", "Syntax" and section is planned to be "Vocabulary". Each block expanded in the future. contains the name of the subsection, a short description, and the "more details" button, which opens the corresponding information in a pop-up window. The pop-up window mechanism is implemented using JavaScript, and the visual solution is implemented using CSS styles. See §4.5 on the phonetic data. Language research The section provides The data are presented in a information on the history of table with two columns. This language research and format was chosen for compact indicates the relevant experts data representation. The left in this language and the column contains the following centers where the language information: a photo (if absent, research is conducted. an abstract image in the form of a human silhouette is used), the full name and the affiliation of the expert, and a link to the expert’s personal page. The right column contains a summary of the expert’s research activities. This mechanism is implemented by means of HTML/CSS. Main publications The publications are grouped Information is presented in the into the following sections: "accordion" format used in the grammars and grammatical web layout for compact data essays, dictionaries, selected representation. works on certain aspects of This section contains a list of grammar, publications of texts clickable titles. Only one in the language, works on header can remain open: when sociolinguistics and ethnology. you click on any title in this view, the previous open one is closed. This mechanism is implemented using JavaScript and CSS3. Resources The section provides links to The data are presented as a available electronic resources text field and a table with two (corpora and text collections, columns. This format was dictionaries, etc.). chosen for compact data representation. The left column contains the following information: the logo (if available) and the name (with a link to the resource). The right column contains brief information about the resource. This mechanism is technically implemented using HTML/CSS. Administrative and public The section provides The information is presented as support information on the a text field and a table with two administrative and / or public columns. This format was support for the language, chosen for compact data including the support from representation. The left column non-governmental contains the following organizations and language information: a logo (if activists. available) or a photo (in the case of a person) and a name or a full name (in the case of a person). The right column provides a summary of their activities. This mechanism is technically implemented using HTML/CSS. Data source The section indicates the The data is wrapped in an experts who provided the data individual HTML block and is on the language. stylistically highlighted in color. The implementation is built on HTML/CSS tools. Text The section contains video and Video files are integrated from / or audio recordings of the YouTube. The website has its texts with transcript and own channel. Audio files can be morphological annotation. integrated with SoundCloud. A Each text is accompanied by corresponding template is detailed metadata (general written for this purpose. Audio data about the text, files can also be uploaded information about the directly to the template on the recording, transcript and website in OGG, MP3, or WAV morphological annotation). formats. In this case, the library MediaElement.js is responsible for displaying the files [18]. Metadata markup for the texts is presented as separate HTML blocks, stylized in such a way that each element and its caption are distinguished. At the database level, each element has a separate cell. The transcript and morphological annotation are uploaded as PDF files. This block is implemented using JavaScript, HTML, and CSS. SoundCloud and YouTube embed integration tools as well as the library MediaElement.js were used in the development process. Photographs The section contains In this section we employ a photographs of native "carousel", which is one of the speakers, settlements, well-known ways of displaying traditional household items, images. It consists of multiple clothing, etc. slides and navigation elements between them. Each slide contains an image and a caption. Only one slide is shown at a time. There are buttons on the right and on the left sides of it to switch between the slides. There is a panel with dots under the slide. The number of dots corresponds to the number of slides. When a dot is clicked, the user is transferred to the corresponding slide in the array, and the dot is highlighted in the active color. When you click on a slide, the image opens without cropping. Its original proportions are preserved, although it is limited in weight and scale. The mechanism is implemented using the jQuery library owl.carousel.js [19]. The markup and styling are performed by means of HTML/CSS. Image files with .png, .gif, .jpg or .jpeg resolutions and any aspect ratio can be uploaded to the template. The images are then passed through a program that automatically processes them for displaying in the carousel.