Transforming Archived Resources with
Language Technology: From Manuscripts to
Language Documentation
Niko Partanen1 , Rogier Blokland2 , Michael Rießler3 and Jack Rueter1
1
  University of Helsinki
2
  Uppsala University
3
  University of Eastern Finland


                                         Abstract
                                         Transcriptions in different languages are a ubiquitous data format in linguistics and in many other fields
                                         in the humanities. However, the majority of these resources remain both under-used and under-studied.
                                         This may be the case even when the materials have been published in print, but is certainly the case
                                         for the majority of unpublished transcriptions. Our paper presents a workflow adapted in the research
                                         project Language Documentation Meets Language Technology, which combines text recognition, auto-
                                         matic transliteration and forced alignment into a process which allows us to convert earlier transcribed
                                         documents to a structure that is comparable with contemporary language documentation corpora. This
                                         has complex practical and methodological considerations.

                                         Keywords
                                         documentary linguistics, language technology, text recognition, forced alignment, Zyrian Komi,


1. Introduction
In many fields in the humanities there is a long history of collected materials that have been
stored in various archives. One type of such material is formed by hand- or typewritten linguis-
tic transcriptions, sometimes accompanied by interlinear glosses and translations. The exact
transcription and annotation conventions vary between research traditions, but vast collec-
tions of notebooks containing transcribed and translated materials in endangered languages
are still ubiquitous in the field of linguistics. In linguistic research of Uralic (and other lan-
guages beyond our scope) there is a long history of publishing these transcriptions in printed
volumes, but it is almost impossible to estimate how much material still remains unpublished
and thus basically inaccessible for research. Furthermore, we argue that in many cases even
the printed versions are not as useful as they could potentially be. The reasons are primarily
The 6th Digital Humanities in the Nordic and Baltic Countries 2022 Conference, Uppsala, Sweden, March 15-18.
£ niko.partanen@helsinki.fi (N. Partanen); rogier.blokland@moderna.uu.se (R. Blokland);
michael.riessler@uef.fi (M. Rießler); jack.rueter@helsinki.fi (J. Rueter)
ç https://researchportal.helsinki.fi/en/persons/niko-partanen (N. Partanen);
https://uefconnect.uef.fi/henkilo/michael.riessler (M. Rießler);
https://researchportal.helsinki.fi/en/persons/jack-rueter (J. Rueter)
ȉ 0000-0001-8584-3880 (N. Partanen); 0000-0003-4927-7185 (R. Blokland); 0000-0002-2397-2860 (M. Rießler);
0000-0002-3076-7929 (J. Rueter)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                         370
technical: a printed page in a graphically complex linguistic transcription (typically applied
in these data sources) is as non-searchable for modern computer systems as a handwritten or
typed transcription. At the same time the digitization process that can be applied to these
sources is remarkably similar, as we describe in this study.
   There is thus a clear need and motivation to make this type of material more accessible. An
excellent example is the EuroBABEL project,1 which published transcribed, translated and an-
notated versions of texts in Ob-Ugric languages (Uralic) originally published in printed works
written by various linguists. These texts were digitized, analysed using FLEx linguistic anno-
tation software and made available as a database. The American Philosophical Society is also
currently running a long-term project where more than 2000 pages of Tunica (a language iso-
late spoken in the United States) materials are processed in a comparable way.2 This illustrates
that processing a collection of this size is usually an undertaking lasting several years, involv-
ing six-figure budgets. We believe this is necessarily often the case: these old archival materials
are so complex and multi-layered that publishing them in modern editions is always a large
and complex task. Much of the work that would need to be done necessarily also concerns
analysing the texts in detail within the context in which they were created, and connecting
them to other archival sources where possible. Ideally a digital edition of a transcription is
not just an online transcription, but an attempt to make the work understandable and useful
for modern audiences, most importantly in ways that support the language communities from
which the materials originate.
   This vision of re-publishing of or corpus building with transcribed materials requires that we
also look for ways to shift the workload involved away from the most manual phases, so that
specialists and community members can focus on higher level tasks which arise mostly after the
text is digitally readable and searchable. Although the work on text recognition in itself is very
valuable, we additionally take into account the context where recordings of the transcriptions
are also available. This is connected closely to the history of audio recording methods and
the use of this technology in linguistics: the period when the work was conducted largely
determines if there even could be a recording. The fact also remains that not everything has
been archived, and, even when something has been, whether the materials are findable or still
usable are different questions; see also Poa & LaPolla [1, 351]. However, on an optimistic note,
we believe that in many cases the audio recordings do exist, can be found and may already be
digitized.
   We want to note here that digital re-publishing of this kind of material customarily involves
three different steps. These are text recognition, transliteration, and forced alignment. The
first is the task that extracts the searchable text from an image. We can distinguish optical
character recognition (OCR) from handwritten text recognition (HTR), but the concrete tech-
nical differences are minor nowadays. Transliteration is the process where we convert digitised
character strings from one to another writing system, e.g. changing the writing system from
Latin to Cyrillic, or from one expert transcription system to another. In many cases this task
is similar to normalization, where we would just change some of the spellings, for example.

    1
     OUL and its proceeding project OUDB, see https://www.babel.gwi.uni-muenchen.de/.
    2
     https://www.amphilsoc.org/blog/cnair-awarded-grant-develop-digital-linguistic-resource-tunica-biloxi-tri
be-louisiana


                                                    371
   The last step in the workflow is forced alignment. This means automatically aligning the tex-
tual representation with the audio segments where the words or sentences have been uttered.
The importance of aligned transcriptions has been strongly emphasized early on in language
documentation, see, e.g. discussions in [2]. In our experience the current technical solutions
are satisfactory to address almost all the steps that can be identified in a pipeline that trans-
forms these materials into a structure close to contemporary language documentation corpora.
This approach was already described by Blokland et al. [3], and our work builds on that.
   There has been increased discussion over the last years on using technology in endangered
language contexts. For example, although there has been some success with automatic speech
recognition (ASR) systems for endangered languages [4][5], it has also been pointed out that in
some contexts unassisted transcription may simply be preferred [6]. Such viewpoints are very
important, and it is crucial not to present technical advances automatically as real improve-
ments, especially before some tangible long-term results can be presented and demonstrated.
In our context transcribing the materials again would also mean repeating the work that has
already been done decades ago, which seems hard to justify. This situation is obviously very
different when no previous transcriptions have been done. To clarify our context further, the
present study therefore also aims to discuss the use of a set of related technologies within the
internal data management workflow of a language documentation project during the period
2017–2021. As the need for further documentation of the endangered languages continues
to be a global issue and undertaking, we believe this context will remain relevant, but also
acknowledge that we discuss relatively narrow working environments and goals of academic
research groups at European universities.


2. Case study
In our case study we use the transcriptions the Permian Komi linguist Raisa Batalova (1931–
2016) carried out in 1971 with Zyrian Komi recordings made by the Finnish linguist Erkki
Itkonen (1913–1992) in the Komi Republic between 21.12.1957 and 1.1.1958 [7].
   The pluricentric Komi language is a Uralic language, related to e.g. Finnish, North Saami
and Hungarian. It has three main variants, Zyrian Komi, Permian Komi and Yazva Komi, all
spoken in the northeast of European Russia. Zyrian Komi is spoken mostly in the Komi Re-
public, and has approximately 170,000 speakers. As the language is related to Finnish, Finnish
Finno-Ugrists have always shown a strong interest in it, though it was not easy to visit the
Komi-speaking regions during Soviet times. However, Itkonen managed to visit the Komi Re-
public in 1958, where he was the first Finnish linguist to do so after 1907 [7]. During this trip
Itkonen was mostly in Syktyvkar, the capital of what was then the Komi Autonomous Soviet
Socialist Republic, and made a number of recordings of the language, both of the standard lan-
guage and of dialects, as spoken by linguists he met at the Komi section of the USSR Academy
of Sciences, and by teachers and students at the Pedagogical Institute. Back in Finland Itkonen
used these recordings for his own notes on Komi; in the late 1950s he transcribed and trans-
lated five of the recordings into Finnish for his private use. In 1971 the Finno-Ugric Society
and the Tape Archive of the Finnish Language paid the linguist Raisa Batalova a stipend to
transcribe all the recordings and translate them into Russian [8, 508]. Batalova transliterated


                                              372
and translated a total of 364 pages. Both the recordings and transliterations are archived in
the Tape Archive of the Finnish Language in the Institute for the Languages of Finland, and
we have currently processed approximately one third of them. The present dataset therefore
contains 119 pages of handwritten transcriptions and their Russian translations. The current
texts are approximately 18,000 tokens of transcribed Komi, and the whole transcribed material
will likely be approximately 50,000 tokens. In the Figure 2 there is a small sample from the
style of transcription.


Figure 1: Example of Raisa Batalova’s transcribed lines [Institute for the Languages of Finland]


   The sample can be presented in Unicode characters as follows (the English translation has
been added by us): mian kołva s’ikt sułałe̮ kołva ju bereg dori ̮n, ǯuǯi ̮d ge̮ra vi ̮łi ̮n. s’ikti ̮n stavi ̮s
veti ̮mi ̮n kimi ̮n kerka. unži ̮ki ̮s ‘Our Kolva village stands by the bank of the Kolva river, on a high
hill. There are all in all about fifty houses. The most …’. The Russian translation is handwritten
on the adjacent page. Following our workflow, we can then connect this representation to the
audio, as the original reel-to-reel tapes have been digitized. This is shown in Figure 2.


Figure 2: Word aligned data in ELAN [Institute for the Languages of Finland & IKDP-2 project]


2.1. Experiment design
To complement the dataset of 119 pages that was processed in our project, we selected 12 pages
that were further annotated with word level alignment between the audio and transcription
text. This includes four recordings of the Ižma dialect, which are not included in the text recog-
nition experiments reported here. The setup is artificial, as in practice the Ižma transcriptions,
i.e. those of the dialect that our working group mainly focuses on, were of course among the


                                                     373
most interesting for us. This also justified the more extensive work on these transcriptions, as
the word level alignment was an extra step that we would not normally do. However, it was
necessary in order to evaluate the forced alignment accuracy.
   For a text recognition experiment we needed manually corrected lines. Here we formatted
the lines so that the content being evaluated consists of correctly detected lines of Komi tran-
scription. In this way we can evaluate text recognition accuracy consistently, regardless of
issues possibly caused by layout detection. It is worth noting that if the workflow would be
applied without manual correction and supervision, the errors would cumulatively influence
the results of each additional step. This is not addressed in the current work. For the transliter-
ation experiment the recognized texts are aligned word by word with the orthographic variants
and the audio.


3. Processing workflow
3.1. Layout detection
As the original pages essentially contained horizontal lines on a page, the layout in itself was
not very complex. There were, however, issues in detecting all the lines entirely, and the
built-in layout detection models customarily left parts of the lines undetected, especially at the
beginning and at the end. The quality was workable, but each page took approximately 15
minutes of manual correction before text recognition could be applied.
   We trained two layout detection models with the P2PaLa method [9]. The first model used
37 pages (1113 baselines), the second 61 pages (1798 baselines). Both models improved from the
baseline formed by the models built-in to Transkribus. The model only detected lines within
a full page-sized text area. This demonstrates that training custom line detection models is
also applicable in a context where the available data is relatively small. In the later stage there
was no essential improvement in line detection: the necessary manual corrections were only
few, and partly based on preference. It seems likely the rest of the Batalova’s collection can be
handled with the current model with no adjustments.
   While the recognized text was corrected we manually assigned three tags (‘area-komi’, ‘area-
russian’, and ‘other’), these were both text-area and line level tags, and essentially differentiated
which pages had Komi text and which Russian text. In principle, for some pages, the differen-
tiation could have been carried out differently, assigning a page-level attribute that specified
the language, but some pages were partly in Komi and partly in Russian, the split usually being
where the Komi text ended and the Russian translation began. These were assigned manually,
as it was a very fast task to assign them and manually split the elements when needed (approx.
two hours of manual work for whole collection – remaining 2/3 could be tagged in less than
one workday); it also functioned as an extra quality check. The tag ‘other‘ primarily contains
notes and metadata that does not connect to the running text. It was essential for us to be able
to extract from the pages only the text that has correspondence in the audio.
   There are additional layout elements, such as page numbers, titles and metadata information
in the beginning of the texts. These were ignored in the layout detection phase as their sepa-
ration from other content was not essential for the current work, and tagging them manually
if needed would be similarly simple.


                                                374
3.2. Text Recognition
Following an already established example by Petzell [10], who used text-recognized Swedish
dialect texts, our research group was able to build a handwritten text recognition model in
Transkribus [11] with highly functional accuracy. Although the work was conducted within
a larger research process, we used the materials we had created to evaluate the process in
more detail. We selected an incrementally growing set of pages on which the HTR models
were trained. Each model was tested against the same test set. Similar tests had been done for
printed text recognition [12] and speech recognition [4]. This is a very effective design as it
shows clearly where the thresholds are in the applicability of the current technology. We used
the PyLaia engine [13] with 200 epochs and 20 epochs early stopping.

                  Experiment       Lines    Words     CER (%)      WER (%)      Training time
                  10 pages           289     1591        20.5         67.5           19m 41s
                  25 pages           662     3657        10.3         41.4           22m 48s
                  50 pages          1300     7315         7.6         32.7           32m 29s
                  75 pages          1977    10911         6.5         29.1           44m 10s
                  100 pages         2677    14807         6.0         26.9           51m 26s
Table 1
Results of text recognition experiments.

   The results show that after 50 pages the quality increases at a significantly slower rate. An-
other significant observation is that even with 10 pages the character error rate is 20.5%. This
means that four in five characters are correct. As a consequence it is makes sense to suggest
training a HTR model at a very early stage of the transcription, as creating new proofread
pages becomes increasingly faster. Thirdly, we can state that the whole training experiment
is resource-efficient and fast, as the training time on the server remains very short.3 As the
whole collection is over 300 pages, it is very likely that small increases in the accuracy will
continue to occur while the work expands to new pages.
   The retrieved text is in the Finno-Ugric transcription, with the originally used character
set represented as closely as possible. Different characters have been carefully distinguished,
although there are often several suitable Unicode values that could be used. Again, as long as
the choice is systematic this makes no or little difference from the point of view of subsequent
tasks or use of this data.

3.3. Transliteration
After this, we applied a rule-based transliteration script that transformed the original transcrip-
tion into contemporary Cyrillic orthography used for Komi (which is similar, but not identical,
to Russian Cyrillic). The reason we prefer an orthographic representation in language docu-
mentation, rather than a scientific transcription, was originally argued in Gerstenberger et al.
[14, 35-36]. First, this makes the work useful to the language community, whichs is already

    3
      As a disclaimer, we are not aware of documentation that describes the exact setup on the Transkribus servers,
but still a training time that is under one hour is clearly modest in the wider machine learning context.


                                                       375
familiar with the orthography. At the same time an established orthography can function as a
middle stage before a more detailed transcription is made, if that is needed or desired.
   The transliteration script was written in Python and uses a set of sequentially applied rules.
The challenge encountered is that the Komi standard language has 40 phonemes (36 native and
4 in loanwords /t͡s/, /x/, /f/, /rʲ/) represented by 35 characters in the modern standard language.
Komi features a set of consonants that can be classified according to a palatal dichotomy, similar
to the palatalization dichotomy in Russian, hence the Russian use of a soft sign or fronting
and non-fronting vowels has been adopted for Komi in the post-1938 era. Table 2 presents
four different representations of consonants found in the transcriptions, whereas the leftmost
(Batalova) and the rightmost (modern Cyrillic) indicate the source and target expression for the
present project, respectively. The so-called Molodtsov alphabet (a Cyrillic-based alphabet used
for Komi in the 1920s and 1930s) and IPA presentations – which are potentially relevant for
other similar projects on Zyrian Komi – are also given to illustrate the underlying system. The
approach presented here works in an equally effective manner also in transliteration between
these writing systems, which is why we illustrate them here as well.

   Batalova      Molodtsov     Phonemes                                 Cyrillic
   palatal-       neutral                     (see ‹b|v|g|ž|k|m|p|r|f|x|c|š› to ‹б|в|г|ж|к|м|п|р|ф|х|ц|ш›)
       š            ш               ʃ                              ш (а|и|у|е|о|ы|ӧ)
  symmetric        pairs                                  (see also ‹s|d|t|n|l› to ‹с|д|т|н|л›)
       z             з              z                               з (а|і|у|э|о|ы|ӧ)
      z’            ԅ               ʑ                            з (ь|я|и|ю|е|ё|ьы|ьӧ)
 asymmetric        pairs
       ǯ            җ               dʒ                           дж (а|и|у|е|о|ы|ӧ)
      ʒ’             ԇ              dʑ                           дз (а|и|у|е|о|ы|ӧ)
       č            щ               tʃ                           тш (а|и|у|е|о|ы|ӧ)
     c’/č’           ч              tɕ                           ч (Ø|а|и|у|е|о|ы|ӧ)
Table 2
Phonetic equivalents in Batalova, Molodtsov, IPA, and modern Cyrillic orthography

   A Python script divided the transliteration task into five sequential sets: word-initial, prelet-
ter, letter, pair-vowel and other-letters. The word-initial and preletter sets were used for deal-
ing with multiple-character to single-letter conversion, i.e. dealing with palatal glides and and
the combining UNICODE character conversion for central vowels e̮ to ӧ and i ̮ to ы, respectively.
These steps were also used for removing labialization and accent marking.
   When the orthographic text was analysed with a Komi morphological analyser [15], the ini-
tial accuracy was 79.5%. This is considerably lower than the accuracy usually reached with
written texts or dialect texts where the analyser is sufficiently adapted to the dialectal features,
as described by Rueter et al. [15]. However, the character error rate when compared to manu-
ally verified Ižma wordforms was only 4.2%. This indicates a difference between some of the
Komi dialects in the collection and the coverage of the analyser outside the already adjusted
dialects.
   The Finno-Ugric transcription system as such has been widely used, also for Komi. Yet the
published texts are not currently available as digital versions, there are only few instances
of existing Cyrillic versions of the texts, and different publications differ from each other in


                                                376
transcription details. Also Batalova’s text studied here contains its own conventions that are
not widely seen in other works. This creates a situation where there no exact training data
that could be used exists. At the same time the approach presented above could be fairly easily
extended to other publications that use their own transcription systems.

3.4. Forced alignment
Forced alignment refers to the task where a text and corresponding audio are aligned with one
another. This can be done on different levels, which are usually utterances, words, phonemes
or phones. In our report the texts and corresponding audio files were aligned with a forced
alignment system described by Leinonen et al. [16] that is currently available in the Language
Bank of Finland. We used this implementation in CSC’s Puhti infrastructure as Komi was added
to this system in 2021 under the macrocode kv. This code refers to both Permian Komi and
Zyrian Komi, and indeed the current setup works for both main varieties.
   The forced alignment system of Leinonen et al. [16] is an example of a cross-lingual ap-
plication in this domain. The idea is that an alignment system is trained for one language
and applied to another. As mentioned above, this task can be done with varying granularity.
Matching longer sentences is less exact and has more margin for errors than matching words,
and even more so when we discuss phonemes and phones, where the units are already very
language-specific in themselves.
   There are already examples of using this approach in language documentation contexts. For
example, an Italian model has been used for Australian Kriol, one reason for this being that
they both have similar vowel systems [17, 285]. The system we used was also based on a
Finnish-language model, and the idea is that this would serve as an adequate starting point for
a cross-linguistic forced alignment. The study of Leinonen et al. [16] already carried out tests in
Finnish, Estonian, North Saami, and our work contributes to this work on the Uralic languages,
in this case Zyrian Komi. However, it is not obvious whether there is any particular benefit in
using the Finnish model for Komi as compared to any other language pair, and further testing
with different languages remains important. Our results indicate that at least in the case of
Komi the alignment works very well. The results are displayed in Figure 3.


                            (a)                                    (b)
Figure 3: The alignment accuracy in seconds measured from the start (a) and end (b) of the word.


                                               377
  This shows that the majority of the aligned words are very close to their start and end,
the end displaying more variation. The start difference median is 0.04 and the end difference
median is 0.11, which also shows the higher fluctuation in the word end. We can complement
these figures with our practical experience from manually aligning the materials. For the vast
majority of the aligned words both ends had to be adjusted slightly, but it was very rare that
the predicted location would be very far from the correct location. This makes the system
functional when utterance level alignment is wanted, or if we want to coarsely align the text
recognition result with the original audio.


4. Conclusion
We have presented a pipeline that allows us to efficiently process handwritten transcriptions,
transliterate them into the modern orthography, and align them with the original audio record-
ings. We used approximately one third of the available dataset in these experiments, which
demonstrates that, at least within these constraints, our workflow is practicable. Since the con-
straints under which we operate are not particularly unusual, with similar archived datasets ex-
isting for numerous endangered languages around the world, we believe that these approaches
can easily be extended to new environments. The full material described here will be eventu-
ally published both in print and online, and the current study is part of the reported work in
progress.
   The forced aligned transcriptions can be easily converted into ELAN files [18, 19] (or sim-
ilar tools used in documentary linguistics), storing the original transcription, converted or-
thography and later manually verified and adjusted transcription on their own tiers. At this
level one can easily compare these transcriptions to modern recordings, and apply the exactly
same annotation methods to the archival resources. Thus, we can bring archived transcrip-
tion manuscripts and their recordings into the same unity as current language documentation
endeavours based on new fieldwork. The scope of this extends much beyond the archived
manuscripts and reel-to-reel tapes, but for many languages the combinations of text and un-
aligned audio can be found in innumerable formats and storage locations. Our workflow can
be easily adjusted to most of these situations and help solving a real-world task in current
documentary linguistics of under-resourced and under-researched languages.


Acknowledgements
We want to thank the Kone Foundation (Helsinki) for their support in our research projects
Iźva Komi Documentation Project in 2014–2016 and Language Documentation Meets Language
Technology: The Next Step in the Description of Komi in 2017–2021. We also want to thank the
Institute for the Languages of Finland for giving us access to the Komi recordings used in this
study. We are also grateful to all the Komi speakers and colleagues who have collaborated with
us over the years.


                                              378
References
 [1] D. Poa, R. J. LaPolla, Minority languages of China, in: O. Miyaoka, M. E. Krauss (Eds.),
     The Vanishing Languages of the Pacific, Oxford University Press, 2007, pp. 337–354.
 [2] J. Gippert, U. Mosel, N. Himmelmann (Eds.), Essentials of language documentation, num-
     ber 178 in Trends in Linguistics. Studies and Monographs, Mouton de Gruyter, 2006.
 [3] R. Blokland, N. Partanen, M. Rießler, J. Wilbur, Using computational approaches to in-
     tegrate endangered language legacy data into documentation corpora: Past experiences
     and challenges ahead, in: Proceedings of the Workshop on Computational Methods for
     Endangered Languages, volume 2, 2019.
 [4] N. Partanen, M. Hämäläinen, T. Klooster, Speech recognition for endangered and extinct
     Samoyedic languages, in: Proceedings of the 34th Pacific Asia Conference on Language,
     Information and Computation, 2020.
 [5] N. Hjortnaes, N. Partanen, M. Rießler, F. M. Tyers, Towards a speech recognizer for Komi,
     an endangered and low-resource Uralic language, in: Proceedings of the Sixth Interna-
     tional Workshop on Computational Linguistics of Uralic Languages, 2020.
 [6] E. Prud’hommeaux, R. Jimerson, R. Hatcher, K. Michelson, Automatic speech recogni-
     tion for supporting endangered language documentation, Language Documentation &
     Conservation 15 (2021) 491–513.
 [7] E. Itkonen, Komin tasavallan kielitieteeseen tutustumassa, Virittäjä 62 (1958) 66–66.
 [8] M. Korhonen, Suomalais-ugrilaisen seuran vuosikertomus v. 1971, (Journal de la Société
     Finno-Ougrienne 72 (1973) 505–512.
 [9] L. Quirós, P2pala: Page to page layout analysis toolkit, https://github.com/lquirosd/P2Pa
     LA, 2017. GitHub repository.
[10] E. M. Petzell, Handwritten text recognition and linguistic research, in: Proceedings of
     the Digital Humanities in the Nordic Countries 5th Conference, 2020.
[11] P. Kahle, S. Colutto, G. Hackl, G. Mühlberger, Transkribus-a service platform for transcrip-
     tion, recognition and retrieval of historical documents, in: 2017 14th IAPR International
     Conference on Document Analysis and Recognition, volume 4, 2017.
[12] N. Partanen, M. Rießler, An OCR system for the Unified Northern Alphabet, in: Pro-
     ceedings of the Fifth International Workshop on Computational Linguistics for Uralic
     Languages, 2019.
[13] J. Puigcerver, C. Mocholí, Pylaia, https://github.com/jpuigcerver/PyLaia, 2018. GitHub
     repository.
[14] C. Gerstenberger, N. Partanen, M. Rießler, J. Wilbur, Utilizing language technology in the
     documentation of endangered Uralic languages, Northern European Journal of Language
     Technology 4 (2016) 29–47.
[15] J. Rueter, N. Partanen, M. Hämäläinen, T. Trosterud, et al., Overview of open-source
     morphology development for the Komi-Zyrian language: Past and future, in: Proceedings
     of the Seventh International Workshop on Computational Linguistics of Uralic Languages,
     2021.
[16] J. Leinonen, S. Virpioja, M. Kurimo, et al., Grapheme-based cross-language forced align-
     ment: Results with Uralic languages, in: Proceedings of the 23rd Nordic Conference on
     Computational Linguistics, 2021.


                                              379
[17] C. Jones, W. Li, A. Almeida, A. German, Evaluating cross-linguistic forced alignment of
     conversational data in north Australian Kriol, an under-resourced language, Language
     Documentation and Conservation (2019) 281–299.
[18] Elan (version 6.3) [computer software], 2022. Nijmegen: Max Planck Institute for Psy-
     cholinguistics, The Language Archive. Retrieved from https://archive.mpi.nl/tla/elan.
[19] H. Brugman, A. Russel, Annotating multi-media/multi-modal resources with ELAN, in:
     LREC, 2004, pp. 2065–2068.


A. Online Resources
Our study has used the following online resources.

   • Documentation of the Aalto-ASR system in the Language Bank of Finland’s infrastruc-
     ture at CSC’s Puhti server [In Finnish].
   • Forced alignment evaluation scripts in Aalto-ASR GitHub project.
   • Documentation: How to Train PyLaia-Models in Transkribus


                                            380