Automatic Translation Alignment Pipeline for Multilingual Digital Editions of Literary Works

Automatic Translation Alignment Pipeline for Multilingual Digital Editions of Literary Works MariaLevchenko maria.levchenko@studio.unibo.it Dipartimento di Filologia Classica e Italianistica University of Bologna

Italy

Automatic Translation Alignment Pipeline for Multilingual Digital Editions of Literary Works 1613-0073 2F016C8966BAFBCB25848099E3EE8B16 GROBID - A machine learning software for extracting information from scholarly documents multilingual digital edition Alessandro Manzoni translation alignment literary translation embeddings

This paper investigates the application of translation alignment algorithms in the creation of a Multilingual Digital Edition (MDE) of Alessandro Manzoni's Italian novel I promessi sposi ("The Betrothed"), with translations in eight languages (English, Spanish, French, German, Dutch, Polish, Russian and Chinese) from the 19th and 20th centuries. We identify key requirements for the MDE to improve both the reader experience and support for translation studies. Our research highlights the limitations of current state-of-the-art algorithms when applied to the translation of literary texts and outlines an automated pipeline for MDE creation. This pipeline transforms raw texts into web-based, side-by-side representations of original and translated texts with different rendering options. In addition, we propose new metrics for evaluating the alignment of literary translations and suggest visualization techniques for future analysis.

Introduction

From the very beginning of digital edition creation, there has been a tendency, supported by the power of web technologies, to represent not only the original text but also its translation(s), following the tradition of bilingual printed editions. In this paper, we propose to define multilingual digital editions (MDE) as editions in which translations are not supplementary but essential, intended to enrich both computational analysis and reader experience.

Beyond annotated file accessibility, the MDE should meet additional criteria to be effective. Primarily, the platform must display the original text alongside translations. It is anticipated that there will be a visual correlation between aligned pairs, which will facilitate straightforward comparison and analysis. The accuracy of alignment is, by default, ensuring that the corresponding parts of the texts are properly aligned. Furthermore, the platform should support the visual highlighting of omitted or inserted parts in the translations [6], which will enable users to discern differences and interpret the nuances of each translation.

These requirements are generally feasible for short, structured texts like poetry or historical documents (examples of MDE publishing strategies are described in Appendix A). The challenge is to develop a flexible, automated system that accurately aligns complex literary texts across multiple languages for computational analysis and user-friendly exploration. The technology should be able to handle the complexities of literary texts, including the splitting, merging, and reordering of sentences, and align text fragments of manageable length, ensuring that they are easy for users to read and understand at a glance to obtain insight into the linguistic and cultural nuances of each version. The automated alignment process should save researchers both time and resources.

For the MDE of Alessandro Manzoni's novel "I promessi sposi" (The Betrothed), we propose an automatic translation alignment pipeline that adapts state-of-the-art alignment techniques to the objectives of the multilingual digital edition of literary works for educational and research purposes.

The Betrothed by Alessandro Manzoni and Its Translations

A comparative analysis of translations of the same literary work over time can provide valuable insights into the evolution of interpretation and understanding. "I promessi sposi" is particularly compelling in this context. Not only does it reflect the author's exploration of the Italian language during a period of significant linguistic evolution, but it has also been translated into many European languages over the past two centuries. This makes it an ideal case study for investigating the influence of temporal factors, linguistic shifts, and the reception of the original novel in different cultural contexts.

Two main original editions (1827 and 1840) were translated into European languages and published in parallel in the XIX century. For the development of an automated translation alignment pipeline, we selected and prepared the texts of a wide range of translations of the classic edition of the novel, also known as Quarantana (1840), including English translations from 1845, 1876, 1983, and 2022; Russian translations from 1854 and 1999; a Dutch (1849); a German (1884), a French (1874), a Spanish (1858), a Polish (1882) and a Chinese (1998) (see Appendix 1 for a list of translations).

Related Work

The core of the MDE creation process is translation alignment, which involves mapping corresponding units (typically words or sentences) between a source and target text. State-of-the-art alignment algorithms have evolved significantly in recent years and now perform optimally in many applications, including machine translation, bilingual dictionary creation, and parallel corpus development.

Modern methods have moved from statistical approaches [5,20,18,14,15,10] and lexical associations (Hunalign in [22]), first to the use of machine translation (MT) systems and then to the alignment systems adopted multilingual sentence embeddings, which significantly improves the accuracy (LASER in [2] and LaBSE in [8]). Thomson and Koehn's Vecalign [21] uses LASER embeddings and a recursive dynamic programming approach to achieve state-of-the-art results by reducing complexity from quadratic to linear [21]. These methods use multilingual models to generate embeddings for each sentence, which are then compared using cosine similarity to find the best matches between the original and translated sentences. Liu and Zhu [17] E n g li s h 2 0 2 2 E n g li s h 1 9

Challenges of the Sentence-Level Alignment

While the alignment of text and translation at the line level is sufÏcient for poetic, historical, and even verse dramatic texts (see [1]), where we cannot expect significant variation in the splitting, merging, or reordering of lines, this alignment approach is inadequate for prose due to the extent of restructuring that inevitably occurs in literary prose translation. In such cases, the standard approach is sentence-level alignment. However, it can be challenging, particularly in the case of literary translations, due to the irregularity of the syntactic structure of the original text in another language. Literary translators are not limited to translating a single sentence into another single sentence (this can be described as a one-to-one type of alignment) but are free to manage sentence boundaries and reconfigure sentence structures to better convey the meaning and style of the original text. In this case, in working to achieve the highest similarity score for the aligned pairs, the alignment algorithms are forced to combine several sentences into one, using one-to-many, many-to-one, and many-to-many alignment types.

The ideal alignment type for the MDE is a one-to-one alignment type to maintain the granularity and consistency of the alignment. In our analysis of the sentence-level alignment of I promessi sposi (see Figure 1 for the different translations), while one-to-one alignments are the most common, a significant proportion are more complex types. While this does not inherently complicate the alignment process, as advanced tools such as Bertalign and Vecalign can handle this complexity, the results may be less optimal in terms of meaningfulness. The length of the aligned pairs becomes longer, including several sentences from both the source text and the translation (examples can be seen in the Appendix B). The edge case for this expanded alignment result would be the pairing of the paragraph or even the chapter of the original text with the same of the translation. That's why traditional metrics may not be sufÏcient for evaluating the alignment results. The performance of alignment algorithms is typically evaluated using established metrics such as precision, recall, F1 score, and Alignment Error Rate (AER) [23]. The first limitation of this approach is that it is based on a "gold dataset, " which does not provide insight into the performance of the algorithm with respect to other types of text [9]. A second consequence is that the scores may be high, but the results are not suitable for MDE because the aligned pairs are too large to be analyzed or identified at a glance by a human observer. We, therefore, suggest that, in addition to the increasing importance of the distribution of alignment types (one-to-one, one-to-many, many-to-one, many-to-many) as a metric of the acceptability of the results, the number and length of alignment pairs derived from the original sentences should also be considered. A number of aligned pairs close to the number of original sentences would indicate an effective alignment process. Conversely, a significant reduction in the number of aligned pairs would indicate limitations of the sentence-level alignment approach, as it implies that the alignment algorithm is forced to combine more sentences to obtain the appropriate similarity score.

The length of alignment pairs will indicate if they are suitable for human readers. In the context of creating a digital edition for educational or research purposes with multiple languages, it is not advisable to present long-aligned texts, given the limited attention span and working memory of the readers (for further insight, see studies of working memory and comprehension with multiple text reading [11,12]).

To illustrate, if a sentence in the source language (Italian) is 130 tokens long and its corresponding sentence in the target language (English) is 140 tokens long, readers may encounter difÏculties in comparing and understanding such lengthy segments. Even the use of color differentiation to highlight aligned pairs does not overcome this challenge (see Figure 2).

In summary, in the context of MDE of literary works, sentence-level alignment still faces a significant challenge. 1) Sentence boundaries are not stable in different languages, which leads to a variety of alignment types and doesn't allow a consistent alignment across the MDE.

2) Strict sentence-level alignment does not fully reflect the variability of the translated texts, such as inserted or omitted parts, and 3) strains readers' attention spans and working memory and fails to achieve the alignment granularity that is comfortable for the overall reading experience. The alignment process needs to be modified to address these challenges to automated processing and readability.

Sentence Segmentation as an Alternative Solution

The alternative methods can provide more accurate and meaningful segmentation of literary texts. In an attempt to move from sentence-level alignment to phrase-or segment-level alignment, here are two promising approaches:

• Punctuation splitting: Applying punctuation marks (such as commas, periods, and semicolons) to create initial segments. This method provides natural breaks in the text, preserving the contextual meaning. However, by using this approach and aligning the resulting segments with Bertalign, we achieved a more granular alignment but increased the number of reordering problems that didn't occur with sentence-level alignment. • Zero/Few-Shot Prompting with LLM Models: The sentences of the original text are segmented using zero-shot prompting OpenAI CPT-4o model [19]. The approved segments are then used as patterns for few-shot prompting to segment the sentences of the translations. This approach provides a robust foundation for universal alignment.

The similarity score can be visualized to evaluate segment-level alignment and compare its results with traditional sentence-level alignment. In addition, the visual representation of the similarity score of the aligned segments or sentences allows us to find the semantic outliers.

After extracting high-dimensional embeddings for each aligned line from the original and translated text using the multilingual model (LaBSE), we applied t-Distributed Stochastic Neighbour Embedding (t-SNE) to reduce them to two dimensions. By visually examining the cosine similarity, we can detect anomalies and curious translated fragments (see Figure 3), even if the alignment algorithm establishes the correlation between sentences.

There are several quantitative metrics that can be used to assess the quality of alignment:

• Comparing sentence-level and segment-level alignment, we can assume that even sentencelevel alignment provides valuable insights into the differences between the translations and the original text (see Appendix D for examples in Spanish translation of I Promessi sposi); segmentlevel alignment allows us to go deeper and capture more nuanced variations between the original and the translated text. For example, we can identify the omission of the end of chapter 8 in the German translation (see Table 1) and two omissions in the Russian translation of chapter 1 (see Tables 3-2), which can be interpreted through the lens of cultural differences and/or censorship and could not be captured by sentence-level alignment.

Original

German 1880

Voi,» continuò volgendosi alle due donne, «potrete fermarvi a ***.

Und ihr», fuhr er, zu den beiden Frauen gewandt, fort, «ihr könnt euch in *** so lange aufhalten. By providing a more granular and accurate alignment, the segment-level approach also allows the length of aligned pairs to be reduced (increasing their number) and makes the MDE more suitable for reader reception compared to sentence-level alignment (see Figure 4).

Intro Cap1 Cap2 Cap3 Cap4 Cap5 Cap6 Cap7 Cap8 Cap9 Cap10 Cap11 Cap12 Cap13 Cap14 Cap15 Cap16 Cap17 Cap18 Cap19 Cap20 Cap21 Cap22 Cap23 Cap24 Cap25 Cap26 Cap27 Cap28 Cap29 Cap30 Cap31 Cap32 Cap33 Cap34 Cap35 Cap36 Cap37 Cap38

Multilingual Digital Edition Pipeline

The automated pipeline for the MDE is proposed as a means of enabling creators to prepare annotated TEI files that are accessible, adaptable, correct, easily parsed by computational tools, and rendered for readers (see Figure 5). We start with the raw texts of the translations in TXT format, obtained after OCR and error checking. For Manzoni's text, we used TEI files with identifiers assigned to each token. This preparation allows us to take into account the irregular segmentation to be expected due to inconsistencies across the languages.

Step 1. The choice of the segmentation method. Based on the above analysis and the specifics of the texts to be published, the MDE developers can select the segmentation methodology in accordance with the projected audience and the project's objectives, enabling alignment at the sentence, phrase, or word level, or a combination, giving readers the flexibility to choose their preferred option.

Step 2-3. Segmentation of the original text and the translations. Depending on the decision made in the first step, the text can be split into sentences, segments, or even tokens.

Step 4. Applying alignment algorithms: By default, we applied Bertalign with the LaBSE model, trained on 109 languages [7] to the segments obtained at the previous steps. Other multilingual sentence-transformer models, such as BGE M3-Embedding, can also be used [4].

Step 5. Choosing the encoding approach. The encoding approach determines the flexibility of the alignment description for future rendering and for establishing a link between the original and translated texts. For structured texts, where each segment in the original closely corresponds to an equivalent segment in the translations, it may be appropriate to mark each segment with the same identifier. Given the complexity of multilingual alignment, we have taken a different approach. The TEI encoding of the original text includes identifiers for each token, providing granular reference points. The TEI-encoded translation text is divided into segments, each referencing the start and end identifiers from the original text, allowing for flexible and accurate alignment.

Step 6. Encoding. By iterating over the alignment results, we assign the referencing start and end identifiers from the original text to each aligned segment from the translation and generate a new TEI file for the translation, ensuring that all segments are accurately linked to the corresponding elements.

Step 7. Rendering Aligned Texts on the Web. Render the original and translated texts from the TEI files as two columns on a web page with separate XSLT templates for the original and translated text. This interactive interface allows users to click on the original text and see the corresponding translation fragment highlighted, enhancing the user experience by providing an intuitive way to explore and compare the translations side by side.

Step 8. Visualization and evaluation. While the highly unstable text versions [13] or linelevel aligned translations [16] can be effectively visualized with the Sankey diagram or bipartite graph, the alignment results for the modern translations can be visualized with the approach described above, based on the embedding vectors with t-SNE and clustering with DBSCAN. As for the presentation in the user interface, ideally, all multilingual translations should be comparable and aligned with each other, allowing the user to see and interpret the differences.

Future Development and Challenges

• Current alignment algorithms face challenges in accurately aligning segments with reordered content. Future work will focus on improving the alignment performance in such cases, ensuring more precise matches even when the original and translated texts differ significantly in structure. • Previous studies on user behavior in digital editions have analyzed log files to understand interaction patterns [3]. To gain deeper insights, we are using advanced tools such as ReactFlow to study more comprehensively how users interact with different elements of MDEs. For example, when readers view two lines in different languages side by side, the optimal reading span may differ from traditional reading practices. By analysing user interactions, we aim to determine the most effective segment length for the MDEs.

Conclusion

The proposed pipeline aims to improve the development of Multilingual Digital Editions (MDE) by ensuring that MDE is both methodologically robust and user-centered. By prioritizing user experience and usability, the pipeline adapts existing computational methods and algorithms to the specific needs of educational and research applications.

We have also proposed new metrics for MDEs that focus on the consistency, meaningfulness and granularity of the alignment. These metrics assess the suitability of an alignment for educational and research purposes. By ensuring that the alignment is accessible to human readers while supporting translation studies, the pipeline balances conciseness and reader engagement.

C. Examples of Many-to-Many Alignment Type

D. Examples of Omission Captured through Sentence-Level Alignment

Table 9: Italian / Spanish segment level alignment for the Chapter 7

Italian Spanish 1858

Gertrude domandò sommessamente e tremando, che cosa dovesse fare.

Gertrudis con mucha timidez pidió la explicación de aquellas palabras y lo que debía hacer en consecuencia. Il principe (non ci regge il cuore di dargli in questo momento il titolo di padre) non rispose direttamente, ma cominciò a parlare a lungo del fallo di Gertrude: e quelle parole frizzavano sull'animo della poveretta, come lo scorrere d'una mano ruvida sur una ferita.

Italian

Spanish 1858

Continuò dicendo che, quand'anche... caso mai... che avesse avuto prima qualche intenzione di collocarla nel secolo, lei stessa ci aveva messo ora un ostacolo insuperabile; giacché a un cavalier d'onore, com'era lui, non sarebbe mai bastato l'animo di regalare a un galantuomo una signorina che aveva dato un tal saggio di sé.

Él continuó diciendo que... "á pesar de lo ocurrido... en el caso en que... hubiera sido con la intención de establecerse en el mundo, ella había contraído un lazo indisoluble y había creado un obstáculo invencible. Hombre de honor como era, jamás se habría atrevido á presentarla á ningún caballero después de tales antecedentes".

«Ebbene, non si parli più del passato: tutto è cancellato.

En hora buena; no hablemos más de lo pasado: todo está olvidado ya. Avete preso il solo partito onorevole, conveniente, che vi rimanesse; ma perché l'avete preso di buona voglia, e con buona maniera, tocca a me a farvelo riuscir gradito in tutto e per tutto: tocca a me a farne tornare tutto il vantaggio e tutto il merito sopra di voi. Ne prendo io la cura.»

E. Examples of Omission Captured through Segment-Level Alignment

Table 10: Italian / Spanish segment level alignment for the Chapter 7

Italian Spanish 1858

«Brava! bene!» esclamarono, a una voce, la madre e il figlio, -Muy bien, muy bien, exclamaron á la par madre é hijo. e l'uno dopo l'altra abbracciaron Gertrude; la quale ricevette queste accoglienze con lacrime, che furono interpretate per lacrime di consolazione. Allora il principe si diffuse a spiegar ciò che farebbe per render lieta e splendida la sorte della figlia.

Italian

Spanish 1858

Parlò delle distinzioni di cui goderebbe nel monastero e nel paese;

Entonces el príncipe habló de las distinciones que Gertrudis habría de tener en el convento y en el país. che, là sarebbe come una principessa, come la rappresentante della famiglia; che, appena l'età l'avrebbe permesso, sarebbe innalzata alla prima dignità; e, intanto, non sarebbe soggetta che di nome.

Figure 1 :1Figure 1: Alignment Types in The Betrothed

Figure 2 :2Figure 2: The visualization of the alignment of the long sentence.

of sentence embeddings in the Chapter 23 Figure 3 :233Figure 3: Similarity visualisation for sentence-level alignment in the German translation of chapter 23. This translation omits Don Abbondio's inner monologue, which is not captured in the sentencelevel alignment, but is evident in the visualisation, where several Italian sentences appear without corresponding German pairs.

Figure 4 :4Figure 4: Reducing the Lengths of Aligned Pairs for the Spanish 1858

Figure 5 :5Figure 5: The Translation Alignment Pipeline

Table 1 :1Italian / German segment-level alignmentOriginalGerman 1880Presto, io spero, potrete ritornar sicuri aIch hoffe, ihr werdet bald ohne Gefahr incasa vostra;euer Haus zurückkehren können;a ogni modo, Dio vi provvederà, per ilin jedem Falle wird Gott Alles zu euermvostro meglio;Besten lenken.e io certo mi studierò di non mancare allagrazia che mi fa, scegliendomi per suo min-istro, nel servizio di voi suoi poveri cari tri-bolati.

Table 1 :1(continued)

Table 2 :2Italian / Russian segment-level alignment

OriginalRussian 1854Ai tempi in cui accaddero i fatti che pren-Во время тех событий, которые мыdiamo a raccontare, quel borgo, già consid-намерены описать, Лекко было ужеerabile, era anche un castello,значительным местечком и маленькойкрепостцей;e aveva perciò l'onore d'alloggiare un co-вследствие чего в нем жили комендантmandante, e il vantaggio di possedere unaи постоянный гарнизон испанскихstabile guarnigione di soldati spagnoli,солдат,che insegnavan la modestia alle fanciullee alle donne del paese, accarezzavan ditempo in tempo le spalle a qualche marito,a qualche padre; e, sul finir dell'estate,non mancavan mai di spandersi nelle vi-которыезанималисьсобираниемgne, per diradar l'uve, e alleggerire a' con-винограда.tadini le fatiche della vendemmia.

Table 3 :3Italian / Russian segment-level alignmentOriginalRussian 1854Con tutto ciò,Несмотря на все,anzi in gran parte a cagion di ciò,и может быть, потому именно,quelle gride, ripubblicate e rinforzate digoverno in governo, non servivano adaltro che ad attestare ampollosamentel'impotenza de' loro autori,o, se producevan qualche effetto immedi-если декреты имели минутнуюato...действительность...

Table 4 :4Strategies of Text/Translation Representation in Multilingual Digital Editions (continued) The Betrothed, by Alessandro Manzoni. London, G. Bell and Sons, 1876.Project NameAlignmentComparisonNotesTypeThe Community of thesentences+The side-by-side viewer ofRealm in Scotlandthe Latin text with its En-glish translation aligns thesentences, allowing usersto click on the sentencenumber in the original text,which automatically scrollsthe other side of the page tothe corresponding sentence.Tabula Salomonislines+The TEI Publisher tool al-lows the user to highlightcorresponding parts and au-tomatically scroll when hov-ering over the lines.• (English 1983). Alessandro Manzoni, The Betrothed, Bruce Penman (tr.), Penguin Ran-dom House UK. London, 1983.• (English 2022). The Betrothed. A novel, translated and with Introduction of Michael Moore, Preface by Pulitzer Prize-Winning Author Jhumpa Lahiri, Modern Library, 2022.• (Russian 1854). Обрученные : Медиолан. быль XVIII [!XVII] столетия, найден. и передел. Александром Манзони / Пер. с итал. В.С. Межевича. Ч. 1-4. Москва,1854. 4 т.; 20. (Библиотека романов, повестей, путешествий и записок, изд. Н.Н.Улитиным; Вып. 7, т. 1-2, 6-7).• (Russian 1999). Обрученные [Повесть из истории Милана XVII в.] / А. Мандзони; [Пер. с итал. под ред. Н. Георгиевской, А. Эфроса]. Москва: Терра-Книжный клуб,1999.• (Dutch 1849). De verloofden: eene Milanesche geschiedenis uit de zeventiende eeuw. Vol. 1. Translated by Petrus Van Limburg Brouwer. Groningen, Van Boekeren, 1849. • (German 1884). Die Verlobten: eine Mailändischer Geschichte aus dem 17. Jahrhundert, Volume 1. 3rd ed. Regensburg, G.J. Manz, 1884. • (French 1874). Les fiancés: histoire milanaise du XVIIe siècle / Alexandre Manzoni; traduite de l'italien par Rey Dussueil. Paris: Charpentier, 1874.

B. Translations• (English 1845). The Betrothed Lovers: A Milanese Story of the Seventeenth Century. With the Column of Infamy. By Alessandro Manzoni. In Three Volumes. Henry Francis C. Logan. London: Longman, Brown, Green, and Longmans, Paternoster-Row. • (English 1876).

Table 5 :5Italian / French: 3-1 alignment typeItalianFrench1 Sì; ma com'è dozzinale! com'è sguaiato!1 Oui; mais comme il est commun! commecom'è scorretto! 2 Idiotismi lombardi a iosa, frasi della lin-il est inégal! comme il est incorrect! id-iotismes lombards à foison, phrases degua adoperate a sproposito, grammatica ar-la langue employées à rebours, construc-bitraria, periodi sgangherati. 3 E poi, qualche eleganza spagnola semi-tions arbitraires, périodes boiteuses; et puis quelques petites élégances espagnolesnata qua e là; e poi, ch'è peggio, ne' lu-semées ça et là; et puis, ce qui est bien pis,oghi più terribili o più pietosi della storia,dans les endroits les plus terribles ou lesa ogni occasione d'eccitar maraviglia, o diplus touchants de son histoire, à chaquefar pensare, a tutti que' passi insomma cheoccasion d'exciter la surprise ou de fairerichiedono bensì un po' di rettorica, ma ret-penser, à tous les passages enfin qui de-torica discreta, fine, di buon gusto, costuimandent, il est vrai, quelques fleurs de rhé-non manca mai di metterci di quella suatorique, mais d'une rhétorique sobre, fine,così fatta del proemio.de bon goût, ce digne homme ne manquejamais d'y mettre quelque chose dans legenre de son début.

Table 6 :6Italian / German: 2-2 alignment type with the overlapping sentence boundariesItalianGerman1 Né alcuno dirà questa sij imperfettione del1 Und es wird gewiß niemand sagen, diesRacconto, e defformità di questo mio rozzosei ein Geschichtsfälscher und eine Entstel-Parto, a meno questo tale Critico non sij per-lung dieser meiner einfältigen Erzählung,sona affatto diggiuna della Filosofia: chees sei denn der Tadel ein Mann, der allerquanto agl'huomini in essa versati, ben vederanno nulla mancare alla sostanza diWeltweisheit vollständig bar wäre. 2 Denn man wird bald sehen, daß indetta Narratione. 2 Imperciocché, essendo cosa evidente, e da verun negata non essere i nomi se non puri purissimi accidenti...Beziehung der darin vorkommenden Per-sonen am Wesentlichsten der besagten Erzählung nichts fehle; zumal es eine augenfällige, von niemand gelenkte Sache ist, daß Namen bloß reine Nebensachen seien…

Table 7 :7Italian / Dutch: 2-2 alignment type with the overlapping sentence boundariesItalianDutch1 Però alla mia debolezza non è lecito1 Doch mijn' geringeren krachten is hetsolleuarsi a tal'argomenti, e sublimità peri-niet gegeven zich tot zoo hooge vlugt, totcolose, con aggirarsi tra Labirinti de' Politicizulk eene gevaarvolle verhevenheid te ver-maneggj, et il rimbombo de' bellici Ori-heffen, en zich te wagen in den doolhof dercalchi: solo che hauendo hauuto notitia distaatkundige spitsvondigheden of te middenfatti memorabili, se ben capitorno a gente meccaniche, e di piccol affare, mi accingovan het geschal der schorre krijgsklaroenen. 2 Naardemaal 'er dus eenige merk-di lasciarne memoria a Posteri, con far diwaardige gebeurtenissen ter mijnertutto schietta e genuinamente il Racconto,kennis gekomen zijn, welke, wel is waar,ouuero sia Relatione. 2 Nella quale si vedrà in angusto Teatroslechts menschen van gering bedrijve en lage geboorte betreffen, maar des alniet-luttuose Traggedie d'horrori, e Scenetemin eene rijke vertooning opleveren vandi malvaggità grandiosa, con intermezidroevige en vreesselijke ongevallen, voor-d'Imprese virtuose e buontà angeliche, op-beelden van drieste boosheid, doormengdposte alle operationi diaboliche.met vrome ondernemingen en verheerljktdoor het zielesterkend schouwspel vanhemelsche deugd, in onophoudelijkenstrijd met de gruwelijke aanslagen derhelle, zoo heb ik besloten mij aan tegorden om daarvan der nakomelingschapeen getrouw en nauwkeurig Verhaal ofteRelaas achterlaten.

Table 8 :8Italian / Spanish: 2-3 alignment type with the overlapping sentence boundariesItalianSpanish1 Ma che? quando siamo stati al punto1 Pero, ¡oh cielos! llegado el momento de re-di raccapezzar tutte le dette obiezioni ecapitular las objeciones y sus respuestas y elrisposte, per disporle con qualche ordine,de ordenarlas, hallamos, que habíamos he-misericordia! venivano a fare un libro. 2 Veduta la qual cosa, abbiam messo dacho un libro: visto lo cual, abandonamos nuestro intento por dos razones, que sinparte il pensiero, per due ragioni che il let-duda alguna el lector considerará oportu-tore troverà certamente buone: la prima, che un libro impiegato a giustificarne un al-nas. -2 La primera, porque temimos que el hacertro, anzi lo stile d'un altro, potrebbe parerun libro para justificar otro, ó solo su estilo,cosa ridicola: la seconda, che di libri basta uno per volta, quando non è d'avanzo.parecería cosa ridícula. 3 La segunda, porque creemos que es sufi-ciente, cuando no excesivo, el publicar unsolo libro á la vez.

Table 9 :9(continued)

Table 10 :10(continued)

Acknowledgments

This research was supported by the Dipartimento di Filologia Classica e Italianistica, University of Bologna, as part of the project "Manzoni online2: manoscritti e documenti inediti, tradizione e traduzioni" (CUP J34I19003370001, project code 2017CFZFAY_003). For more information on the Leggo Manzoni project, visit https://projects.dharc.unibo.it/leggomanzoni.

Appendixes

A. Strategies of Text/Translation Representation in MDE

Project Name

Alignment Comparison Notes

Separate pages for the text and the translation

Decameron web --La entretenida by Miguel de Cervantes --

Same page for the text and the translation with JS switcher

Furnace and Fugue -- The original text in Old Church Slavonic is directly followed by its corresponding parallel Greek text.

Dynamic alignment display

Electronic Beowulf lines + When the special view type and option are selected, and the user hovers the mouse over a line, the translation appears in a special area. Kassák Lajos:

The Horse Dies the Birds Fly Away lines + The page displays side-byside views of the original text and its translations into two other languages, highlighting the corresponding translated line when the mouse hovers over the original line.

AlignVis: Semi-automatic Alignment and Visualization of Parallel Translations MAlharbi TCheesman RSLaramee 10.1109/iv51561.2020.00026 2020 24th International Conference Information Visualisation (IV) 2020 Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings MArtetxe HSchwenk 10.18653/v1/P19-1309 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics AKorhonen DTraum LMàrquez the 57th Annual Meeting of the Association for Computational Linguistics

Florence, Italy

2019 Association for Computational Linguistics Editing for Man and Machine. Digital Scholarly Editions and their Users ABaillot ABusch 10.4000/variants.1220 Variants 2021 BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation JChen SXiao PZhang KLuo DLian ZLiu 10.48550/arXiv.2402.03216 eprint: 2402.03216 2024 cs.CL) Char_align: A Program for Aligning Parallel Texts at the Character Level KWChurch 10.3115/981574.981575 31st Annual Meeting of the Association for Computational Linguistics

Columbus, Ohio, USA

Association for Computational Linguistics 1993 Beyond translation: engaging with foreign languages in a digital library GCrane ABabeu LCerrato AParrish CPenagos FShamsian JTauber JWegner 10.1007/s00799-023-00349-2 International Journal on Digital Libraries 24 2023 Language-agnostic BERT Sentence Embedding FFeng YYang DCer NArivazhagan WWang 10.48550/arXiv.2007.01852 eprint: 2007.01852 2020 Language-agnostic BERT Sentence Embedding FFeng YYang DCer NArivazhagan WWang Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics SMuresan PNakov AVillavicencio the 60th Annual Meeting of the Association for Computational Linguistics

Dublin, Ireland

2022 1 Association for Computational Linguistics Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation AFraser DMarcu 10.1162/coli.2007.33.3.293 Computational Linguistics 33 3 2007 A Program for Aligning Sentences in Bilingual Corpora WAGale KWChurch Computational Linguistics J. Hirschberg 19 1993 Automated and controlled processes in comprehending multiple documents CHahnel FGoldhammer UKroehne NMahlow CArtelt CSchoor Studies in Higher Education 46 2021 Working memory capacity as a predictor of multiple text comprehension LHildenbrand JWiley 10.1080/0163853x.2023.2197690 Discourse Processes 60 4-5 2023 Interactive Visual Alignment of Medieval Text Versions SJänicke DJWrisley 2017 IEEE Conference on Visual Analytics Science and Technology (VAST) 2017 Exploitation des cognats dans les systèmes d'alignement bi-textuel : architecture et évaluation OKraif Revue TAL : traitement automatique des langues 42 3 2001 Yet Another Fast, Robust and Open Source Sentence Aligner. Time to Reconsider Sentence Alignment? FLamraoui PLanglais XIV Machine Translation Summit

Nice, France

2013 Interactive Visualisation of Shakespeare's Othello RSLaramee SJWalton XLiu 2018 Swansea University MA thesis Bertalign: Improved word embedding-based sentence alignment for Chinese-English parallel corpora of literary texts LLiu MZhu 10.1093/llc/fqac089 Digital Scholarship in the Humanities 38 2 2023 Sentence and word alignment on the CRATER project: methods and assessment TMcenery MOakes Proceedings of the Association for Computational Linguistics Workshop SIG-DAT Workshop the Association for Computational Linguistics Workshop SIG-DAT Workshop 1995 <author> <persName><surname>Openai</surname></persName> </author> <author> <persName><surname>Chatgpt</surname></persName> </author> <ptr target="https://chat.openai.com.2024" /> <imprint> <date>May 13</date> </imprint> </monogr> </biblStruct> <biblStruct xml:id="b19"> <analytic> <title level="a" type="main">Using cognates to align sentences in bilingual corpora MSimard GFFoster PIsabelle Proceedings of the Fourth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages the Fourth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

Montréal, Canada

1992 Vecalign: Improved Sentence Alignment in Linear Time and Space BThompson PKoehn 10.18653/v1/D19-1136 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) KInui JJiang VNg XWan the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Hong Kong, China

2019 Association for Computational Linguistics Parallel corpora for medium density languages DVarga PHalácsy AKornai NViktor NLaszlo NLászló TViktor Recent Advances in Natural Language Processing IV: Selected papers from RANLP 2005. 2007 EVALIGN: Visual Evaluation of Translation Alignment Models TYousef GHeyer SJänicke 10.18653/v1/2023.eacl-demo.31 Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations DCroce LSoldaini the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

Dubrovnik, Croatia

2023 Association for Computational Linguistics