=Paper=
{{Paper
|id=Vol-3834/paper128
|storemode=property
|title=Automatic Translation Alignment Pipeline for Multilingual Digital Editions of Literary Works
|pdfUrl=https://ceur-ws.org/Vol-3834/paper128.pdf
|volume=Vol-3834
|authors=Maria Levchenko
|dblpUrl=https://dblp.org/rec/conf/chr/Levchenko24
}}
==Automatic Translation Alignment Pipeline for Multilingual Digital Editions of Literary Works==
<pdf width="1500px">https://ceur-ws.org/Vol-3834/paper128.pdf</pdf>
<pre>
                                Automatic Translation Alignment Pipeline for
                                Multilingual Digital Editions of Literary Works
                                Maria Levchenko
                                Dipartimento di Filologia Classica e Italianistica, University of Bologna, Italy


                                            Abstract
                                            This paper investigates the application of translation alignment algorithms in the creation of a Multi-
                                            lingual Digital Edition (MDE) of Alessandro Manzoni’s Italian novel I promessi sposi (“The Betrothed”),
                                            with translations in eight languages (English, Spanish, French, German, Dutch, Polish, Russian and Chi-
                                            nese) from the 19th and 20th centuries. We identify key requirements for the MDE to improve both the
                                            reader experience and support for translation studies. Our research highlights the limitations of current
                                            state-of-the-art algorithms when applied to the translation of literary texts and outlines an automated
                                            pipeline for MDE creation. This pipeline transforms raw texts into web-based, side-by-side represen-
                                            tations of original and translated texts with different rendering options. In addition, we propose new
                                            metrics for evaluating the alignment of literary translations and suggest visualization techniques for
                                            future analysis.

                                            Keywords
                                            multilingual digital edition, Alessandro Manzoni, translation alignment, literary translation, embed-
                                            dings


                                1. Introduction
                                From the very beginning of digital edition creation, there has been a tendency, supported by
                                the power of web technologies, to represent not only the original text but also its translation(s),
                                following the tradition of bilingual printed editions. In this paper, we propose to define mul-
                                tilingual digital editions (MDE) as editions in which translations are not supplementary but
                                essential, intended to enrich both computational analysis and reader experience.
                                    Beyond annotated file accessibility, the MDE should meet additional criteria to be effective.
                                Primarily, the platform must display the original text alongside translations. It is anticipated
                                that there will be a visual correlation between aligned pairs, which will facilitate straightfor-
                                ward comparison and analysis. The accuracy of alignment is, by default, ensuring that the
                                corresponding parts of the texts are properly aligned. Furthermore, the platform should sup-
                                port the visual highlighting of omitted or inserted parts in the translations[6], which will enable
                                users to discern differences and interpret the nuances of each translation.
                                    These requirements are generally feasible for short, structured texts like poetry or histor-
                                ical documents (examples of MDE publishing strategies are described in Appendix A). The
                                challenge is to develop a flexible, automated system that accurately aligns complex literary

                                CHR 2024: Computational Humanities Research Conference, December 4–6, 2024, Aarhus, Denmark
                                £ maria.levchenko@studio.unibo.it (M. Levchenko)
                                ç https://mary-lev.github.io/ (M. Levchenko)
                                ȉ 0000-0002-0877-7063 (M. Levchenko)
                                          © 2024 Copyright for this paper by its author. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                                                                          1086
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
texts across multiple languages for computational analysis and user-friendly exploration. The
technology should be able to handle the complexities of literary texts, including the splitting,
merging, and reordering of sentences, and align text fragments of manageable length, ensur-
ing that they are easy for users to read and understand at a glance to obtain insight into the
linguistic and cultural nuances of each version. The automated alignment process should save
researchers both time and resources.
   For the MDE of Alessandro Manzoni’s novel ”I promessi sposi” (The Betrothed), we propose an
automatic translation alignment pipeline that adapts state-of-the-art alignment techniques to
the objectives of the multilingual digital edition of literary works for educational and research
purposes.


2. The Betrothed by Alessandro Manzoni and Its Translations
A comparative analysis of translations of the same literary work over time can provide valuable
insights into the evolution of interpretation and understanding. “I promessi sposi” is particu-
larly compelling in this context. Not only does it reflect the author’s exploration of the Italian
language during a period of significant linguistic evolution, but it has also been translated into
many European languages over the past two centuries. This makes it an ideal case study for in-
vestigating the influence of temporal factors, linguistic shifts, and the reception of the original
novel in different cultural contexts.
   Two main original editions (1827 and 1840) were translated into European languages and
published in parallel in the XIX century. For the development of an automated translation
alignment pipeline, we selected and prepared the texts of a wide range of translations of the
classic edition of the novel, also known as Quarantana (1840), including English translations
from 1845, 1876, 1983, and 2022; Russian translations from 1854 and 1999; a Dutch (1849); a
German (1884), a French (1874), a Spanish (1858), a Polish (1882) and a Chinese (1998) (see
Appendix 1 for a list of translations).


3. Related Work
The core of the MDE creation process is translation alignment, which involves mapping corre-
sponding units (typically words or sentences) between a source and target text. State-of-the-art
alignment algorithms have evolved significantly in recent years and now perform optimally in
many applications, including machine translation, bilingual dictionary creation, and parallel
corpus development.
   Modern methods have moved from statistical approaches [5, 20, 18, 14, 15, 10] and lexical
associations (Hunalign in [22]), first to the use of machine translation (MT) systems and then
to the alignment systems adopted multilingual sentence embeddings, which significantly im-
proves the accuracy (LASER in [2] and LaBSE in [8]). Thomson and Koehn’s Vecalign [21] uses
LASER embeddings and a recursive dynamic programming approach to achieve state-of-the-art
results by reducing complexity from quadratic to linear [21]. These methods use multilingual
models to generate embeddings for each sentence, which are then compared using cosine simi-
larity to find the best matches between the original and translated sentences. Liu and Zhu [17]


                                              1087
                                                                            Alignment Types Across Different Translations

                                  8000

                                  7000

                                  6000


               Total Alignments
                                  5000

                                  4000

                                  3000

                                  2000
                                               One-to-One
                                               One-to-Many
                                  1000
                                               Many-to-One
                                               Many-to-Many
                                    0
                                                2


                                                            2


                                                                       82


                                                                                    99


                                                                                                 2


                                                                                                             7


                                                                                                                          3


                                                                                                                                    80


                                                                                                                                                58
                                              02


                                                          97


                                                                                              88


                                                                                                             87


                                                                                                                          84
                                                                       19


                                                                                  19


                                                                                                                                    18


                                                                                                                                            18
                                                2


                                                            1


                                                                                              1


                                                                                                           1


                                                                                                                       1
                                            ish


                                                        ish


                                                                   ian


                                                                                  ian


                                                                                           sh


                                                                                                        ch


                                                                                                                    ch


                                                                                                                                an


                                                                                                                                           ish
                                                                                            li


                                                                                                         n


                                                                                                                      n


                                                                                                                               rm
                                         gl


                                                     gl


                                                                                                                                           an
                                                                 ain


                                                                                ss


                                                                                         Po


                                                                                                     Fre


                                                                                                                  Fre
                                         En


                                                    En


                                                                             Ru


                                                                                                                               Ge


                                                                                                                                         Sp
                                                                   r
                                                                Uk


                                                                                             Translations


Figure 1: Alignment Types in The Betrothed


introduced Bertalign, which uses LaBSE vectors and demonstrated superior performance on
the Bible One dataset and an English-Chinese literary corpus.


4. Challenges of the Sentence-Level Alignment
While the alignment of text and translation at the line level is sufÏcient for poetic, historical,
and even verse dramatic texts (see [1]), where we cannot expect significant variation in the
splitting, merging, or reordering of lines, this alignment approach is inadequate for prose due to
the extent of restructuring that inevitably occurs in literary prose translation. In such cases, the
standard approach is sentence-level alignment. However, it can be challenging, particularly in
the case of literary translations, due to the irregularity of the syntactic structure of the original
text in another language. Literary translators are not limited to translating a single sentence
into another single sentence (this can be described as a one-to-one type of alignment) but are
free to manage sentence boundaries and reconfigure sentence structures to better convey the
meaning and style of the original text. In this case, in working to achieve the highest similarity
score for the aligned pairs, the alignment algorithms are forced to combine several sentences
into one, using one-to-many, many-to-one, and many-to-many alignment types.
   The ideal alignment type for the MDE is a one-to-one alignment type to maintain the gran-
ularity and consistency of the alignment. In our analysis of the sentence-level alignment of I
promessi sposi (see Figure 1 for the different translations), while one-to-one alignments are the
most common, a significant proportion are more complex types. While this does not inherently
complicate the alignment process, as advanced tools such as Bertalign and Vecalign can handle
this complexity, the results may be less optimal in terms of meaningfulness. The length of the
aligned pairs becomes longer, including several sentences from both the source text and the
translation (examples can be seen in the Appendix B). The edge case for this expanded align-
ment result would be the pairing of the paragraph or even the chapter of the original text with
the same of the translation.


                                                                                          1088
Figure 2: The visualization of the alignment of the long sentence.


    That’s why traditional metrics may not be sufÏcient for evaluating the alignment results.
The performance of alignment algorithms is typically evaluated using established metrics such
as precision, recall, F1 score, and Alignment Error Rate (AER) [23]. The first limitation of this
approach is that it is based on a ”gold dataset,” which does not provide insight into the perfor-
mance of the algorithm with respect to other types of text [9]. A second consequence is that the
scores may be high, but the results are not suitable for MDE because the aligned pairs are too
large to be analyzed or identified at a glance by a human observer. We, therefore, suggest that,
in addition to the increasing importance of the distribution of alignment types (one-to-one,
one-to-many, many-to-one, many-to-many) as a metric of the acceptability of the results, the
number and length of alignment pairs derived from the original sentences should also be
considered. A number of aligned pairs close to the number of original sentences would indicate
an effective alignment process. Conversely, a significant reduction in the number of aligned
pairs would indicate limitations of the sentence-level alignment approach, as it implies that the
alignment algorithm is forced to combine more sentences to obtain the appropriate similarity
score.
    The length of alignment pairs will indicate if they are suitable for human readers. In the con-
text of creating a digital edition for educational or research purposes with multiple languages,
it is not advisable to present long-aligned texts, given the limited attention span and working
memory of the readers (for further insight, see studies of working memory and comprehension
with multiple text reading [11, 12]).
    To illustrate, if a sentence in the source language (Italian) is 130 tokens long and its corre-
sponding sentence in the target language (English) is 140 tokens long, readers may encounter
difÏculties in comparing and understanding such lengthy segments. Even the use of color
differentiation to highlight aligned pairs does not overcome this challenge (see Figure 2).
    In summary, in the context of MDE of literary works, sentence-level alignment still faces
a significant challenge. 1) Sentence boundaries are not stable in different languages, which
leads to a variety of alignment types and doesn’t allow a consistent alignment across the MDE.
2) Strict sentence-level alignment does not fully reflect the variability of the translated texts,


                                                1089
such as inserted or omitted parts, and 3) strains readers’ attention spans and working memory
and fails to achieve the alignment granularity that is comfortable for the overall reading expe-
rience. The alignment process needs to be modified to address these challenges to automated
processing and readability.


5. Sentence Segmentation as an Alternative Solution
The alternative methods can provide more accurate and meaningful segmentation of literary
texts. In an attempt to move from sentence-level alignment to phrase- or segment-level align-
ment, here are two promising approaches:

    • Punctuation splitting: Applying punctuation marks (such as commas, periods, and semi-
      colons) to create initial segments. This method provides natural breaks in the text, pre-
      serving the contextual meaning. However, by using this approach and aligning the re-
      sulting segments with Bertalign, we achieved a more granular alignment but increased
      the number of reordering problems that didn’t occur with sentence-level alignment.
    • Zero/Few-Shot Prompting with LLM Models: The sentences of the original text are seg-
      mented using zero-shot prompting OpenAI CPT-4o model [19]. The approved segments
      are then used as patterns for few-shot prompting to segment the sentences of the trans-
      lations. This approach provides a robust foundation for universal alignment.

The similarity score can be visualized to evaluate segment-level alignment and compare its
results with traditional sentence-level alignment. In addition, the visual representation of the
similarity score of the aligned segments or sentences allows us to find the semantic outliers.
   After extracting high-dimensional embeddings for each aligned line from the original and
translated text using the multilingual model (LaBSE), we applied t-Distributed Stochastic Neigh-
bour Embedding (t-SNE) to reduce them to two dimensions. By visually examining the cosine
similarity, we can detect anomalies and curious translated fragments (see Figure 3), even if the
alignment algorithm establishes the correlation between sentences.
   There are several quantitative metrics that can be used to assess the quality of alignment:

    • Number of resulting aligned pairs with respect to the number of input segments;
    • Consistency: Ensure that segments are consistently aligned across languages, preserv-
      ing the meaning and context of the original text.
    • Number of clusters: The number of clusters indicates how the sentences are grouped.
      An ideal alignment would result in clusters where each cluster contains two points cor-
      responding to the embedding of the aligned pair, one from the original text and one from
      the translation, placed close to each other;
    • Average similarity: Calculating the average similarity within clusters gives an insight
      into the coherence of the alignments, with higher values indicating more semantically
      consistent groupings.
    • The length of the aligned pair of lines indicates whether the detected lines are within
      the reader’s attention span and suitable for the user experience.


                                             1090
                                                               Similarity visualisation of sentence embeddings in the Chapter 23
                    100
                                                                                                                                                                  language
                                                                                                                                                                    german
                     90                                                                                                                                             italian

                     80

                     70


                     60

                     50


                     40


                     30

                     20


                     10

                      0
               y


                   −10

                   −20


                   −30

                   −40


                   −50


                   −60

                   −70


                   −80

                   −90


                   −100
                          −120 −110   −100   −90   −80   −70     −60   −50   −40   −30   −20   −10   0    10    20   30    40      50   60   70   80   90   100
                                                                                                x


Figure 3: Similarity visualisation for sentence-level alignment in the German translation of chapter
23. This translation omits Don Abbondio’s inner monologue, which is not captured in the sentence-
level alignment, but is evident in the visualisation, where several Italian sentences appear without
corresponding German pairs.


   Comparing sentence-level and segment-level alignment, we can assume that even sentence-
level alignment provides valuable insights into the differences between the translations and the
original text (see Appendix D for examples in Spanish translation of I Promessi sposi); segment-
level alignment allows us to go deeper and capture more nuanced variations between the orig-
inal and the translated text. For example, we can identify the omission of the end of chapter 8
in the German translation (see Table 1) and two omissions in the Russian translation of chap-
ter 1 (see Tables 3-2), which can be interpreted through the lens of cultural differences and/or
censorship and could not be captured by sentence-level alignment.


                                      Table 1: Italian / German segment-level alignment
   Original                                                                                          German 1880
   Presto, io spero, potrete ritornar sicuri a                                                       Ich hoffe, ihr werdet bald ohne Gefahr in
   casa vostra;                                                                                      euer Haus zurückkehren können;
   a ogni modo, Dio vi provvederà, per il                                                            in jedem Falle wird Gott Alles zu euerm
   vostro meglio;                                                                                    Besten lenken.
   e io certo mi studierò di non mancare alla
   grazia che mi fa, scegliendomi per suo min-
   istro, nel servizio di voi suoi poveri cari tri-
   bolati.


                                                                                               1091
                                    Table 1: (continued)
   Original                                     German 1880
  Voi,» continuò volgendosi alle due donne, Und ihr», fuhr er, zu den beiden Frauen
  «potrete fermarvi a ***.                  gewandt, fort, «ihr könnt euch in *** so
                                            lange aufhalten.


                     Table 2: Italian / Russian segment-level alignment
   Original                                     Russian 1854
   Ai tempi in cui accaddero i fatti che pren- Во время тех событий, которые мы
   diamo a raccontare, quel borgo, già consid- намерены описать, Лекко было уже
   erabile, era anche un castello,               значительным местечком и маленькой
                                                 крепостцей;
   e aveva perciò l’onore d’alloggiare un co- вследствие чего в нем жили комендант
   mandante, e il vantaggio di possedere una и постоянный гарнизон испанских
   stabile guarnigione di soldati spagnoli,      солдат,
   che insegnavan la modestia alle fanciulle
   e alle donne del paese, accarezzavan di
   tempo in tempo le spalle a qualche marito,
   a qualche padre; e, sul finir dell’estate,
   non mancavan mai di spandersi nelle vi- которые          занимались  собиранием
   gne, per diradar l’uve, e alleggerire a’ con- винограда.
   tadini le fatiche della vendemmia.


                     Table 3: Italian / Russian segment-level alignment
   Original                                     Russian 1854
   Con tutto ciò,                               Несмотря на все,
   anzi in gran parte a cagion di ciò,          и может быть, потому именно,
   quelle gride, ripubblicate e rinforzate di
   governo in governo, non servivano ad
   altro che ad attestare ampollosamente
   l’impotenza de’ loro autori,
   o, se producevan qualche effetto immedi-     если декреты имели         минутную
   ato...                                       действительность...

  By providing a more granular and accurate alignment, the segment-level approach also al-
lows the length of aligned pairs to be reduced (increasing their number) and makes the MDE
more suitable for reader reception compared to sentence-level alignment (see Figure 4).


                                           1092
                                                                       Average Lengths for Original and Translation Aligned Pairs in Spanish_1858
                                    60   spanish_1858 - Original Segment Length
                                         spanish_1858 - Translated Segment Length
                                         spanish_1858 - Original Sentence Length
                                         spanish_1858 - Translated Sentence Length
                                    50
            Average Length (in Words)

                                    40


                                    30


                                    20


                                    10
                                          Intro
                                          Cap1
                                          Cap2
                                          Cap3
                                          Cap4
                                          Cap5
                                          Cap6
                                          Cap7
                                          Cap8
                                          Cap9
                                         Cap10
                                         Cap11
                                         Cap12
                                         Cap13
                                         Cap14
                                         Cap15
                                         Cap16
                                         Cap17
                                         Cap18
                                         Cap19
                                         Cap20
                                         Cap21
                                         Cap22
                                         Cap23
                                         Cap24
                                         Cap25
                                         Cap26
                                         Cap27
                                         Cap28
                                         Cap29
                                         Cap30
                                         Cap31
                                         Cap32
                                         Cap33
                                         Cap34
                                         Cap35
                                         Cap36
                                         Cap37
                                         Cap38
Figure 4: Reducing the Lengths of Aligned Pairs for the Spanish 1858


6. Multilingual Digital Edition Pipeline
The automated pipeline for the MDE is proposed as a means of enabling creators to prepare
annotated TEI files that are accessible, adaptable, correct, easily parsed by computational tools,
and rendered for readers (see Figure 5).


Figure 5: The Translation Alignment Pipeline


  We start with the raw texts of the translations in TXT format, obtained after OCR and error
checking. For Manzoni’s text, we used TEI files with identifiers assigned to each token. This
preparation allows us to take into account the irregular segmentation to be expected due to


                                                                                                    1093
inconsistencies across the languages.
   Step 1. The choice of the segmentation method. Based on the above analysis and the
specifics of the texts to be published, the MDE developers can select the segmentation method-
ology in accordance with the projected audience and the project’s objectives, enabling align-
ment at the sentence, phrase, or word level, or a combination, giving readers the flexibility to
choose their preferred option.
   Step 2-3. Segmentation of the original text and the translations. Depending on the
decision made in the first step, the text can be split into sentences, segments, or even tokens.
   Step 4. Applying alignment algorithms: By default, we applied Bertalign with the LaBSE
model, trained on 109 languages [7] to the segments obtained at the previous steps. Other
multilingual sentence-transformer models, such as BGE M3-Embedding, can also be used [4].
   Step 5. Choosing the encoding approach. The encoding approach determines the flexi-
bility of the alignment description for future rendering and for establishing a link between the
original and translated texts. For structured texts, where each segment in the original closely
corresponds to an equivalent segment in the translations, it may be appropriate to mark each
segment with the same identifier. Given the complexity of multilingual alignment, we have
taken a different approach. The TEI encoding of the original text includes identifiers for each
token, providing granular reference points. The TEI-encoded translation text is divided into
segments, each referencing the start and end identifiers from the original text, allowing for
flexible and accurate alignment.
   Step 6. Encoding. By iterating over the alignment results, we assign the referencing start
and end identifiers from the original text to each aligned segment from the translation and
generate a new TEI file for the translation, ensuring that all segments are accurately linked to
the corresponding elements.
   Step 7. Rendering Aligned Texts on the Web. Render the original and translated texts
from the TEI files as two columns on a web page with separate XSLT templates for the orig-
inal and translated text. This interactive interface allows users to click on the original text
and see the corresponding translation fragment highlighted, enhancing the user experience by
providing an intuitive way to explore and compare the translations side by side.
   Step 8. Visualization and evaluation. While the highly unstable text versions[13] or line-
level aligned translations[16] can be effectively visualized with the Sankey diagram or bipartite
graph, the alignment results for the modern translations can be visualized with the approach
described above, based on the embedding vectors with t-SNE and clustering with DBSCAN.
As for the presentation in the user interface, ideally, all multilingual translations should be
comparable and aligned with each other, allowing the user to see and interpret the differences.


7. Future Development and Challenges
    • Current alignment algorithms face challenges in accurately aligning segments with re-
      ordered content. Future work will focus on improving the alignment performance in
      such cases, ensuring more precise matches even when the original and translated texts
      differ significantly in structure.
    • Previous studies on user behavior in digital editions have analyzed log files to understand


                                             1094
      interaction patterns [3]. To gain deeper insights, we are using advanced tools such as
      ReactFlow to study more comprehensively how users interact with different elements of
      MDEs. For example, when readers view two lines in different languages side by side, the
      optimal reading span may differ from traditional reading practices. By analysing user
      interactions, we aim to determine the most effective segment length for the MDEs.


8. Conclusion
The proposed pipeline aims to improve the development of Multilingual Digital Editions (MDE)
by ensuring that MDE is both methodologically robust and user-centered. By prioritizing user
experience and usability, the pipeline adapts existing computational methods and algorithms
to the specific needs of educational and research applications.
   We have also proposed new metrics for MDEs that focus on the consistency, meaningfulness
and granularity of the alignment. These metrics assess the suitability of an alignment for edu-
cational and research purposes. By ensuring that the alignment is accessible to human readers
while supporting translation studies, the pipeline balances conciseness and reader engagement.


References
 [1] M. Alharbi, T. Cheesman, and R. S. Laramee. “AlignVis: Semi-automatic Alignment and
     Visualization of Parallel Translations”. In: 2020 24th International Conference Information
     Visualisation (IV). 2020, pp. 98–108. doi: 10.1109/iv51561.2020.00026.
 [2] M. Artetxe and H. Schwenk. “Margin-based Parallel Corpus Mining with Multilingual
     Sentence Embeddings”. In: Proceedings of the 57th Annual Meeting of the Association for
     Computational Linguistics. Ed. by A. Korhonen, D. Traum, and L. Màrquez. Florence, Italy:
     Association for Computational Linguistics, 2019, pp. 3197–3203. doi: 10.18653/v1/P19-1
     309.
 [3] A. Baillot and A. Busch. “Editing for Man and Machine. Digital Scholarly Editions and
     their Users”. In: Variants (2021), pp. 175–187. doi: 10.4000/variants.1220.
 [4] J. Chen, S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu. BGE M3-Embedding: Multi-Lingual,
     Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distilla-
     tion. 2024. doi: 10.48550/arXiv.2402.03216. eprint: 2402.03216 (cs.CL).
 [5] K. W. Church. “Char_align: A Program for Aligning Parallel Texts at the Character Level”.
     In: 31st Annual Meeting of the Association for Computational Linguistics. Columbus, Ohio,
     USA: Association for Computational Linguistics, 1993, pp. 1–8. doi: 10.3115/981574.981
     575.
 [6] G. Crane, A. Babeu, L. Cerrato, A. Parrish, C. Penagos, F. Shamsian, J. Tauber, and J.
     Wegner. “Beyond translation: engaging with foreign languages in a digital library”. In:
     International Journal on Digital Libraries 24 (2023). doi: 10.1007/s00799-023-00349-2.


                                             1095
 [7] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, and W. Wang. “Language-agnostic BERT Sen-
     tence Embedding”. In: CoRR abs/2007.01852 (2020). doi: 10 . 48550 / arXiv . 2007 . 01852.
     eprint: 2007.01852.
 [8] F. Feng, Y. Yang, D. Cer, N. Arivazhagan, and W. Wang. “Language-agnostic BERT Sen-
     tence Embedding”. In: Proceedings of the 60th Annual Meeting of the Association for Com-
     putational Linguistics (Volume 1: Long Papers). Ed. by S. Muresan, P. Nakov, and A. Villav-
     icencio. Dublin, Ireland: Association for Computational Linguistics, 2022, pp. 878–891.
     doi: 10.18653/v1/2022.acl-long.62.
 [9] A. Fraser and D. Marcu. “Squibs and Discussions: Measuring Word Alignment Quality
     for Statistical Machine Translation”. In: Computational Linguistics 33.3 (2007), pp. 293–
     303. doi: 10.1162/coli.2007.33.3.293.
[10]   W. A. Gale and K. W. Church. “A Program for Aligning Sentences in Bilingual Corpora”.
       In: Computational Linguistics 19.1 (1993). Ed. by J. Hirschberg, pp. 75–102. url: https://a
       clanthology.org/J93-1004.
[11]   C. Hahnel, F. Goldhammer, U. Kroehne, N. Mahlow, C. Artelt, and C. Schoor. “Automated
       and controlled processes in comprehending multiple documents”. In: Studies in Higher
       Education 46 (2021), pp. 2074–2086. url: https://api.semanticscholar.org/CorpusID:2375
       38010.
[12]   L. Hildenbrand and J. Wiley. “Working memory capacity as a predictor of multiple text
       comprehension”. In: Discourse Processes 60.4-5 (2023), pp. 378–396. doi: 10.1080/0163853
       x.2023.2197690.
[13]   S. Jänicke and D. J. Wrisley. “Interactive Visual Alignment of Medieval Text Versions”. In:
       2017 IEEE Conference on Visual Analytics Science and Technology (VAST) (2017), pp. 127–
       138. url: https://api.semanticscholar.org/CorpusID:39979643.
[14]   O. Kraif. “Exploitation des cognats dans les systèmes d’alignement bi-textuel : archi-
       tecture et évaluation”. In: Revue TAL : traitement automatique des langues 42.3 (2001),
       pp. 833–867.
[15]   F. Lamraoui and P. Langlais. “Yet Another Fast, Robust and Open Source Sentence
       Aligner. Time to Reconsider Sentence Alignment?” In: XIV Machine Translation Summit.
       Nice, France, 2013, pp. 77–84.
[16]   R. S. Laramee, S. J. Walton, and X. Liu. “Interactive Visualisation of Shakespeare’s Oth-
       ello”. MA thesis. Swansea University, 2018. url: https://api.semanticscholar.org/Corpus
       ID:70244581.
[17]   L. Liu and M. Zhu. “Bertalign: Improved word embedding-based sentence alignment for
       Chinese–English parallel corpora of literary texts”. In: Digital Scholarship in the Human-
       ities 38.2 (2023), pp. 621–634. doi: 10.1093/llc/fqac089.
[18]   T. Mcenery and M. Oakes. “Sentence and word alignment on the CRATER project: meth-
       ods and assessment”. In: Proceedings of the Association for Computational Linguistics
       Workshop SIG-DAT Workshop (1995), pp. 104–116.
[19]   OpenAI. ChatGPT (May 13 version). https://chat.openai.com. 2024.


                                              1096
[20]     M. Simard, G. F. Foster, and P. Isabelle. “Using cognates to align sentences in bilingual
         corpora”. In: Proceedings of the Fourth Conference on Theoretical and Methodological Issues
         in Machine Translation of Natural Languages. Montréal, Canada, 1992, pp. 67–81. url: h
         ttps://aclanthology.org/1992.tmi-1.7.
[21]    B. Thompson and P. Koehn. “Vecalign: Improved Sentence Alignment in Linear Time
        and Space”. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Lan-
        guage Processing and the 9th International Joint Conference on Natural Language Process-
        ing (EMNLP-IJCNLP). Ed. by K. Inui, J. Jiang, V. Ng, and X. Wan. Hong Kong, China:
        Association for Computational Linguistics, 2019, pp. 1342–1348. doi: 10.18653/v1/D19-
        1136.
[22]     D. Varga, P. Halácsy, A. Kornai, N. Viktor, N. Laszlo, N. László, and T. Viktor. “Parallel
         corpora for medium density languages”. In: Recent Advances in Natural Language Pro-
         cessing IV: Selected papers from RANLP 2005. 2007, pp. 292–247. url: https://api.semanti
         cscholar.org/CorpusID:13133927.
[23]     T. Yousef, G. Heyer, and S. Jänicke. “EVALIGN: Visual Evaluation of Translation Align-
         ment Models”. In: Proceedings of the 17th Conference of the European Chapter of the As-
         sociation for Computational Linguistics: System Demonstrations. Ed. by D. Croce and L.
         Soldaini. Dubrovnik, Croatia: Association for Computational Linguistics, 2023, pp. 277–
         297. doi: 10.18653/v1/2023.eacl-demo.31.


Acknowledgments
This research was supported by the Dipartimento di Filologia Classica e Italianistica, University
of Bologna, as part of the project “Manzoni online2: manoscritti e documenti inediti, tradizione
e traduzioni” (CUP J34I19003370001, project code 2017CFZFAY_003). For more information on
the Leggo Manzoni project, visit https://projects.dharc.unibo.it/leggomanzoni.


Appendixes

A. Strategies of Text/Translation Representation in MDE

           Project Name           Alignment       Comparison                    Notes
                           1. Separate pages for the text and the translation
       Decameron web              -               -
       La entretenida by          -               -
       Miguel de Cervantes
                   2. Same page for the text and the translation with JS switcher
       Furnace and Fugue          -               -


                                                 1097
Table 4: Strategies of Text/Translation Representation in Multilingual Digital Editions (contin-
         ued)
    Project Name               Alignment       Comparison       Notes
                               Type

                       3. Side-by-side viewer for the text and translation
    Cantar de mio Cid          -               +               Editions include different
    Vincent van Gogh. The      -               +               modes of the text on a
    Letters                                                    single page with a
    Ancrene Wisse Preface      -               +               side-by-side view,
    The Codex Sinaiticus       -               +               including the translation.
    Project                                                    Typically, there is no
    Dafydd ap Gwilym.net       -               +               alignment, but the text
    Princeton         Dante    -               +               length is usually just one
    Project                                                    page, making comparison
    Ein Sermon von Ablass      -               +               straightforward. These
    und Gnade                                                  viewers enable comparison
    Secrets of Craft and Na-   -               +               of two or more versions of
    ture                                                       the text.


                           4. Interlinear text/translation alignment
    Codex Suprasliensis        lines           +                The original text in Old
                                                                Church Slavonic is directly
                                                                followed by its correspond-
                                                                ing parallel Greek text.
                                   5. Dynamic alignment display
    Electronic Beowulf         lines           +                When the special view type
                                                                and option are selected, and
                                                                the user hovers the mouse
                                                                over a line, the translation
                                                                appears in a special area.
    Kassák Lajos:   The        lines           +                The page displays side-by-
    Horse Dies the Birds                                        side views of the original
    Fly Away                                                    text and its translations into
                                                                two other languages, high-
                                                                lighting the corresponding
                                                                translated line when the
                                                                mouse hovers over the orig-
                                                                inal line.


                                              1098
Table 4: Strategies of Text/Translation Representation in Multilingual Digital Editions (contin-
         ued)
    Project Name              Alignment      Comparison       Notes
                              Type
    The Community of the      sentences      +                The side-by-side viewer of
    Realm in Scotland                                         the Latin text with its En-
                                                              glish translation aligns the
                                                              sentences, allowing users
                                                              to click on the sentence
                                                              number in the original text,
                                                              which automatically scrolls
                                                              the other side of the page to
                                                              the corresponding sentence.
    Tabula Salomonis          lines          +                The TEI Publisher tool al-
                                                              lows the user to highlight
                                                              corresponding parts and au-
                                                              tomatically scroll when hov-
                                                              ering over the lines.


B. Translations
    • (English 1845). The Betrothed Lovers: A Milanese Story of the Seventeenth Century. With
      the Column of Infamy. By Alessandro Manzoni. In Three Volumes. Henry Francis C.
      Logan. London: Longman, Brown, Green, and Longmans, Paternoster-Row.
    • (English 1876). The Betrothed, by Alessandro Manzoni. London, G. Bell and Sons, 1876.
    • (English 1983). Alessandro Manzoni, The Betrothed, Bruce Penman (tr.), Penguin Ran-
      dom House UK. London, 1983.
    • (English 2022). The Betrothed. A novel, translated and with Introduction of Michael
      Moore, Preface by Pulitzer Prize-Winning Author Jhumpa Lahiri, Modern Library, 2022.
    • (Russian 1854). Обрученные : Медиолан. быль XVIII [!XVII] столетия, найден. и
      передел. Александром Манзони / Пер. с итал. В.С. Межевича. Ч. 1-4. Москва,
      1854. 4 т.; 20. (Библиотека романов, повестей, путешествий и записок, изд. Н.Н.
      Улитиным; Вып. 7, т. 1-2, 6-7).
    • (Russian 1999). Обрученные [Повесть из истории Милана XVII в.] / А. Мандзони;
      [Пер. с итал. под ред. Н. Георгиевской, А. Эфроса]. Москва: Терра-Книжный клуб,
      1999.
    • (Dutch 1849). De verloofden: eene Milanesche geschiedenis uit de zeventiende eeuw. Vol.
      1. Translated by Petrus Van Limburg Brouwer. Groningen, Van Boekeren, 1849.
    • (German 1884). Die Verlobten: eine Mailändischer Geschichte aus dem 17. Jahrhundert,
      Volume 1. 3rd ed. Regensburg, G.J. Manz, 1884.
    • (French 1874). Les fiancés: histoire milanaise du XVIIe siècle / Alexandre Manzoni;
      traduite de l’italien par Rey Dussueil. Paris: Charpentier, 1874.


                                             1099
  • (Spanish 1858). Los desposados: historia milanesa del siglo XVII traducida del italiano,
    Volume 1. México: Imp. de Andrade y Escalante, 1858.
  • (Polish 1882). Narzeczeni. Powieść medyolańska z XVII stulecia ze starego rękopisu
     spisana i przerobiona, tłum. Maria z Siermiradzkich Obrąpalska. Warszawa, 1882.
  • (Chinese 1998). Yuehun Fufu / (Yidali) Mengzuoni (Manzoni, A.) zhu; Zhang Shihua yi.
    - Nanjing: Yilin chubanshe, 1998.10.


C. Examples of Many-to-Many Alignment Type


                        Table 5: Italian / French: 3-1 alignment type
  Italian                                          French
  1 Sì; ma com’è dozzinale! com’è sguaiato!        1 Oui; mais comme il est commun! comme

  com’è scorretto!                                 il est inégal! comme il est incorrect! id-
  2 Idiotismi lombardi a iosa, frasi della lin-    iotismes lombards à foison, phrases de
  gua adoperate a sproposito, grammatica ar-       la langue employées à rebours, construc-
  bitraria, periodi sgangherati.                   tions arbitraires, périodes boiteuses; et
  3 E poi, qualche eleganza spagnola semi-         puis quelques petites élégances espagnoles
  nata qua e là; e poi, ch’è peggio, ne’ lu-       semées ça et là; et puis, ce qui est bien pis,
  oghi più terribili o più pietosi della storia,   dans les endroits les plus terribles ou les
  a ogni occasione d’eccitar maraviglia, o di      plus touchants de son histoire, à chaque
  far pensare, a tutti que’ passi insomma che      occasion d’exciter la surprise ou de faire
  richiedono bensì un po’ di rettorica, ma ret-    penser, à tous les passages enfin qui de-
  torica discreta, fine, di buon gusto, costui     mandent, il est vrai, quelques fleurs de rhé-
  non manca mai di metterci di quella sua          torique, mais d’une rhétorique sobre, fine,
  così fatta del proemio.                          de bon goût, ce digne homme ne manque
                                                   jamais d’y mettre quelque chose dans le
                                                   genre de son début.


                                              1100
Table 6: Italian / German: 2-2 alignment type with the overlapping sentence boundaries
Italian                                            German
1 Né alcuno dirà questa sij imperfettione del      1 Und es wird gewiß niemand sagen, dies

Racconto, e defformità di questo mio rozzo         sei ein Geschichtsfälscher und eine Entstel-
Parto, a meno questo tale Critico non sij per-     lung dieser meiner einfältigen Erzählung,
sona affatto diggiuna della Filosofia: che         es sei denn der Tadel ein Mann, der aller
quanto agl’huomini in essa versati, ben            Weltweisheit vollständig bar wäre.
vederanno nulla mancare alla sostanza di           2 Denn man wird bald sehen, daß in

detta Narratione.                                  Beziehung der darin vorkommenden Per-
2 Imperciocché, essendo cosa evidente,             sonen am Wesentlichsten der besagten
e da verun negata non essere i nomi se             Erzählung nichts fehle; zumal es eine
non puri purissimi accidenti...                    augenfällige, von niemand gelenkte
                                                   Sache ist, daß Namen bloß reine
                                                   Nebensachen seien…


Table 7: Italian / Dutch: 2-2 alignment type with the overlapping sentence boundaries
Italian                                            Dutch
1  Però alla mia debolezza non è lecito            1  Doch mijn’ geringeren krachten is het
solleuarsi a tal’argomenti, e sublimità peri-      niet gegeven zich tot zoo hooge vlugt, tot
colose, con aggirarsi tra Labirinti de’ Politici   zulk eene gevaarvolle verhevenheid te ver-
maneggj, et il rimbombo de’ bellici Ori-           heffen, en zich te wagen in den doolhof der
calchi: solo che hauendo hauuto notitia di         staatkundige spitsvondigheden of te midden
fatti memorabili, se ben capitorno a gente         van het geschal der schorre krijgsklaroenen.
meccaniche, e di piccol affare, mi accingo         2 Naardemaal ’er dus eenige merk-

di lasciarne memoria a Posteri, con far di         waardige gebeurtenissen ter mijner
tutto schietta e genuinamente il Racconto,         kennis gekomen zijn, welke, wel is waar,
ouuero sia Relatione.                              slechts menschen van gering bedrijve en
2 Nella quale si vedrà in angusto Teatro           lage geboorte betreffen, maar des alniet-
luttuose Traggedie d’horrori, e Scene              temin eene rijke vertooning opleveren van
di malvaggità grandiosa, con intermezi             droevige en vreesselijke ongevallen, voor-
d’Imprese virtuose e buontà angeliche, op-         beelden van drieste boosheid, doormengd
poste alle operationi diaboliche.                  met vrome ondernemingen en verheerljkt
                                                   door het zielesterkend schouwspel van
                                                   hemelsche deugd, in onophoudelijken
                                                   strijd met de gruwelijke aanslagen der
                                                   helle, zoo heb ik besloten mij aan te
                                                   gorden om daarvan der nakomelingschap
                                                   een getrouw en nauwkeurig Verhaal ofte
                                                   Relaas achterlaten.


                                              1101
 Table 8: Italian / Spanish: 2-3 alignment type with the overlapping sentence boundaries
  Italian                                          Spanish
  1 Ma che?      quando siamo stati al punto       1 Pero, ¡oh cielos! llegado el momento de re-

  di raccapezzar tutte le dette obiezioni e        capitular las objeciones y sus respuestas y el
  risposte, per disporle con qualche ordine,       de ordenarlas, hallamos, que habíamos he-
  misericordia! venivano a fare un libro.          cho un libro: visto lo cual, abandonamos
  2 Veduta la qual cosa, abbiam messo da           nuestro intento por dos razones, que sin
  parte il pensiero, per due ragioni che il let-   duda alguna el lector considerará oportu-
  tore troverà certamente buone: la prima,         nas. —
  che un libro impiegato a giustificarne un al-    2 La primera, porque temimos que el hacer

  tro, anzi lo stile d’un altro, potrebbe parer    un libro para justificar otro, ó solo su estilo,
  cosa ridicola: la seconda, che di libri basta    parecería cosa ridícula.
  uno per volta, quando non è d’avanzo.            3 La segunda, porque creemos que es sufi-

                                                   ciente, cuando no excesivo, el publicar un
                                                   solo libro á la vez.


D. Examples of Omission Captured through Sentence-Level
   Alignment


            Table 9: Italian / Spanish segment level alignment for the Chapter 7
  Italian                                          Spanish 1858
  Gertrude domandò sommessamente e tre- Gertrudis con mucha timidez pidió la expli-
  mando, che cosa dovesse fare.              cación de aquellas palabras y lo que debía
                                             hacer en consecuencia.
  Il principe (non ci regge il cuore di dar-
  gli in questo momento il titolo di padre)
  non rispose direttamente, ma cominciò a
  parlare a lungo del fallo di Gertrude: e
  quelle parole frizzavano sull’animo della
  poveretta, come lo scorrere d’una mano ru-
  vida sur una ferita.


                                              1102
                                     Table 9: (continued)
  Italian                                         Spanish 1858
  Continuò dicendo che, quand’anche... caso       Él continuó diciendo que... “á pesar de lo
  mai... che avesse avuto prima qualche           ocurrido... en el caso en que... hubiera
  intenzione di collocarla nel secolo, lei        sido con la intención de establecerse en el
  stessa ci aveva messo ora un ostacolo in-       mundo, ella había contraído un lazo indis-
  superabile; giacché a un cavalier d’onore,      oluble y había creado un obstáculo inven-
  com’era lui, non sarebbe mai bastato            cible. Hombre de honor como era, jamás
  l’animo di regalare a un galantuomo una         se habría atrevido á presentarla á ningún
  signorina che aveva dato un tal saggio di       caballero después de tales antecedentes”.
  sé.
  «Ebbene, non si parli più del passato: tutto    En hora buena; no hablemos más de lo
  è cancellato.                                   pasado: todo está olvidado ya.
  Avete preso il solo partito onorevole, con-
  veniente, che vi rimanesse; ma perché
  l’avete preso di buona voglia, e con buona
  maniera, tocca a me a farvelo riuscir gra-
  dito in tutto e per tutto: tocca a me a farne
  tornare tutto il vantaggio e tutto il merito
  sopra di voi.
  Ne prendo io la cura.»


E. Examples of Omission Captured through Segment-Level
   Alignment


            Table 10: Italian / Spanish segment level alignment for the Chapter 7
  Italian                                         Spanish 1858
  «Brava! bene!» esclamarono, a una voce, — Muy bien, muy bien, exclamaron á la par
  la madre e il figlio,                       madre é hijo.
  e l’uno dopo l’altra abbracciaron Gertrude;
  la quale ricevette queste accoglienze con
  lacrime,
  che furono interpretate per lacrime di con-
  solazione.
  Allora il principe si diffuse a spiegar ciò
  che farebbe per render lieta e splendida la
  sorte della figlia.


                                             1103
                                  Table 10: (continued)
Italian                                        Spanish 1858
Parlò delle distinzioni di cui goderebbe nel   Entonces el príncipe habló de las distin-
monastero e nel paese;                         ciones que Gertrudis habría de tener en el
                                               convento y en el país.
che, là sarebbe come una principessa, come
la rappresentante della famiglia; che, ap-
pena l’età l’avrebbe permesso, sarebbe in-
nalzata alla prima dignità; e, intanto, non
sarebbe soggetta che di nome.


                                          1104

</pre>