<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ItGraSyll: A Computational Analysis of Graphical Syllabification and Stress Assignment in Italian</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Liviu</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
							<email>ldinu@fmi.unibuc.ro</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Computer Science</orgName>
								<orgName type="institution">University of Bucharest</orgName>
								<address>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department">Human Language Technologies Research Center</orgName>
								<address>
									<settlement>Bucharest</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bogdan</forename><surname>Iordache</surname></persName>
							<email>iordache.bogdan1998@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Computer Science</orgName>
								<orgName type="institution">University of Bucharest</orgName>
								<address>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department">Human Language Technologies Research Center</orgName>
								<address>
									<settlement>Bucharest</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bianca</forename><surname>Guita</surname></persName>
							<email>bianca.guita@s.unibuc.ro</email>
							<affiliation key="aff2">
								<orgName type="department">Human Language Technologies Research Center</orgName>
								<address>
									<settlement>Bucharest</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Simona</forename><surname>Georgescu</surname></persName>
							<email>simona.georgescu@lls.unibuc.ro</email>
							<affiliation key="aff1">
								<orgName type="department">Faculty of Foreign Languages and Literatures</orgName>
								<orgName type="institution">University of Bucharest</orgName>
								<address>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="department">Human Language Technologies Research Center</orgName>
								<address>
									<settlement>Bucharest</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alina</forename><surname>Cristea</surname></persName>
							<affiliation key="aff2">
								<orgName type="department">Human Language Technologies Research Center</orgName>
								<address>
									<settlement>Bucharest</settlement>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ItGraSyll: A Computational Analysis of Graphical Syllabification and Stress Assignment in Italian</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">FE6795A930C2484D41A4A98256FC01D5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:33+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>syllabification</term>
					<term>stress assignment</term>
					<term>Italian</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we build a dataset of Italian graphical syllables (called ItGraSyll). We perform quantitative and qualitative analyses on the syllabification and stress assignment in Italian. We propose a machine learning model, based on deep-learning techniques, for automatically inferring syllabification and stress assignment. For stress prediction we report 94.45% word-level accuracy, and for syllabification we report 98.41% word-level accuracy and 99.82% hyphen-level accuracy.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Word syllabification and syllable analysis are two related issues of great importance in the study of language (written or spoken). These topics have attracted a large category of researchers, from pure linguists, in phonetics, to psycholinguists, computer scientists, speech therapists, etc. Thus, the syllable plays an important role in language learning and acquisition, speech recognition, speech production <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref>, language similarity <ref type="bibr" target="#b2">[3]</ref>, in text comprehensibility (Kincaid-Flesch formula <ref type="bibr" target="#b3">[4]</ref>), in speech therapy, in poetry analysis <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>, etc. Each language has its own way of grouping sounds into syllables and its own rules for dividing words into syllables. Linguistically, the syllable represents "the smallest phonetic trance likely to receive an accent and only one" <ref type="bibr" target="#b6">[7]</ref>, and the syllabic cut is seen by De Saussure <ref type="bibr" target="#b7">[8]</ref> on the border between the implosion and the explosion of the spoken sound: "If in a chain of sounds one goes from implosion to explosion, one obtains a particular effect which is the indication of the boundary of the syllable".</p><p>The analysis of the words' syllabic structure also plays an important part in historical linguistics <ref type="bibr" target="#b8">[9]</ref>, not only in diachronic phonetics and phonology, but also in lexicology. Romance comparative linguistics, in particular, still needs a detailed overview of this aspect, as syllable, segmentation and prosody can give strong account on phonetic changes that haven't been explained yet. The "prosodic revolution" <ref type="bibr" target="#b9">[10]</ref> from Latin to the Romance languages -including syncope (the loss of an intermediate syllable) and apocope (the loss of the final syllable) at a large scale -has led to major changes, but their weight is different from one idiom to another: while the Western Romance languages manifest highly evident differences from the Latin phonological and prosodic system, and the Eastern languages are considered to be most conservative from this point of view, Italian seems to be in between <ref type="bibr" target="#b9">[10]</ref>. On the other hand, in Latin, the relation between stress and quantity grew stronger, thus short stressed vowels progressively gained length. It is noteworthy that this situation is best preserved in Italian, and not in the Eastern Romance idioms: thus, in Italian stress cannot skip a heavy penultimate syllable, and stress cannot fall further back than the antepenultimate syllable, a twofold characteristic feature of the Latin prosodic system. This is why we are taking Italian as a starting point for a largerscale study, oriented towards all Romance languages. The main difference between Latin and its modern descendants is that Latin stress was quantity-sensitive, leading thus to the following rule: in polysyllabic words, stress fell on a heavy penultimate (meaning, containing a long vowel), otherwise on the antepenultimate. Due to the collapse of vowel quantity as a distinctive feature in the vocalic system, no Romance language has retained the Latin stress rule as such <ref type="bibr" target="#b9">[10]</ref>. As, from a statistic point of view, the greatest part of the Romance lexicon is represented by penultimate stressed words, a basic automatic mechanism would assign penultimate stress by default, whereas for both final and antepenultimate stress, the machine (as well as, not in a few cases, non-native speakers) would need further specification. As a consequence of the loss of Latin vowel quantity, Romance stress has ceased to be completely predictable. That is, partially, why in the majority of the traditional Romance compara-tive or historical grammars, there is no specific section devoted to syllabification <ref type="bibr" target="#b10">[11]</ref>, or, if there is, it focuses either on general prosodic features <ref type="bibr" target="#b11">[12]</ref>, or on the vowel evolution depending on its presence in an open or closed syllable <ref type="bibr" target="#b12">[13]</ref>. The lack of a section dedicated to syllabification is also common in the historical grammars of Italian <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b14">15]</ref>. We will focus in this research only on written form of words, so we will investigate only the graphical syllabification and stress. By focusing on the graphical syllabification and stress in Italian, we aim to take a step forward towards the complete evaluation of the prosodic changes that took place in the transition from Latin to the Romance languages, and their influence on the Romance phonetics and phonology. A machinelearning model, capable of automatically inferring graphical syllabification and stress assignment, along with the purpose of creating a data-base containing the quantitative and qualitative description of syllabification and stress in the Romance languages, could be the first important task in the greater challenge of tracing the similarities and differences between the Romance languages and, more important, between Romance and Latin. From a typological point of view, the study of syllabification and stress can shed a new light on the universal features that, by defining our phonoarticulatory and phonoacoustic apparatus, have guided the languages' development and change. Given the promising results of this analysis, the present study can establish the basis of a research of the syllable in other languages, either linguistically or typologically related to Italian.</p><p>One of the studies that address automatic syllabification in Italian belongs to Bigi and Petrone <ref type="bibr" target="#b15">[16]</ref>, who proposed a tool that performs rule-based automatic segmentation. Adsett and Marchand <ref type="bibr" target="#b16">[17]</ref> and Adsett et al. <ref type="bibr" target="#b17">[18]</ref> investigated whether data-driven approaches outperform rule-based approaches for a language with a low syllabic complexity, such as Italian. The authors reached the conclusion that even in this case data-driven systems are the more appropriate approach. In terms of machine learning, the tasks of automatically inferring syllable boundaries and predicting stress assignment can be naturally framed as sequence labeling problems. While automatic syllabification has received more attention recently <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b20">21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b23">24]</ref>, stress placement has not been investigated as much <ref type="bibr" target="#b24">[25]</ref>.</p><p>Given the complexity of syllable applications and word syllabification, the presence of electronic resources dedicated to them becomes a necessity. While native speakers of a language generally do not have great difficulty in spelling words, the same cannot be said of those who learn a foreign language who often tend to apply their own rules to foreign words, and problems arise in automatic syllabification. This is because the rules of syllabification are linguistic rules, and they cannot always be easily modeled by the computer when there are no other linguistic factors that those rules take into account. For example, a rule that is present in many languages distinguishes between a vowel and a semivowel, but the computer is not able to easily recognize when the same sign has the value of a vowel and when it is a semivowel. Because of this, rule-based adaptations of syllabification systems <ref type="bibr" target="#b25">[26]</ref> generally have higher errors, and many languages do not have an automatic syllabification system yet (for example, in the Python library, only a few languages have syllabification). The last few decades have brought the first data-driven syllabification systems.</p><p>However, in order to build such a system, training data is needed, and there are many cases in which the available data do not cover the whole language, and thus the systems have different results when the test corpus is changed.</p><p>Starting with these remarks, our main contributions are:</p><p>• We propose ItGraSyll (Italian graphical syllables), a dataset of 114, 503 Italian words, in orthographic form, containing annotations for their orthographic syllabification and stress placement<ref type="foot" target="#foot_0">1</ref> • We perform quantitative and qualitative analyses of the previously built dataset.</p><p>• We analyze stress placement in the context of the Italian syllables. • We propose an automatic system of syllabification for Italian words.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Quantitative Analysis</head><p>In this section we perform various measurements regarding the syllables and stress placement of Italian written words and analyze the results. We perform, on Italian, an investigation similar to a previous investigations conducted on Romanian by Dinu and Dinu <ref type="bibr" target="#b26">[27]</ref>, Dinu and Dinu <ref type="bibr" target="#b27">[28]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Data</head><p>We build a dataset of Italian words starting from the online version of Dizionario italiano De Mauro, 2 which provides information regarding graphical syllabification and stress placement for the Italian vocabulary. Stressed syllables are also shown by having accents on the dominant vowel. Going further, this dataset will be referred to as ItGraSyll. We performed several pre-processing steps. We cleaned the resulted dataset by removing duplicates, prefixes and suffixes in order to remain with the base word; abbreviations and unwanted punctuation marks such as dots, commas, apostrophes and dashes were also excluded so we can correctly process each word and its syllable division. Finally, the dataset consists of 114, 503 words in orthographic form having between one and eleven syllables. The distribution of words per number of syllables is represented in Table <ref type="table" target="#tab_0">1</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Syllables</head><p>We In order to characterize the average length of a syllable measured in letters, we investigated two cases: a) the average length of the token syllables measured in letters is: 𝐿𝑆𝑦𝑙 𝑡𝑜𝑘𝑒𝑛 = 1,133,515/483,931 = 2.342 b) the type syllables are formed of #𝑇 𝑦𝑝𝑒𝑆𝑦𝑙 𝑙𝑒𝑡 = 13,576 letters. Thus, the average length of a type syllable measured in letters is 𝐿𝑆𝑦𝑙𝑡𝑦𝑝𝑒 = 13,576/3,730 = 3.639.</p><p>These statistics are computed for the words extracted from the dictionary, which were considered to be equally weighted. This excludes any information relating to the frequency of the words with respect to writing or speech. For future research, large corpora of Italian texts can be leveraged in order to recompute these values and include frequency-based weights.</p><p>A list of the most frequent 20 syllables is included in Table <ref type="table" target="#tab_3">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Syllable Structure</head><p>We identified a total of 67 different consonant-vowel structures. The most frequent 7 structures cover almost 97% of the total. Depending on the type-token ratio, the most frequent consonant-vowel structures are the following: a) for the type syllables: cvc (25%), ccvc (20.9%), cvvc (7.79%). b) for the token syllables: cv (58%), cvc (15%), ccv (7%), cvv (4.74%) and v (4.32%). Moreover, we observe that the cv structure corresponds to 40 out of the most frequent 50 syllables from the dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Index</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Stress Placement</head><p>We identified a total of 2,883 stressed syllables (type syllables). So, 847 syllables are never stressed. The most frequent 20 stressed syllables are represented in Table <ref type="table" target="#tab_4">3</ref>.</p><p>We observe that the most frequent stressed syllable (men) has a very high stress ratio (90%) when we compare the stressed occurrences with all its occurrences (stressed and unstressed) in our database. While in the top 20 of all syllables, men is the only syllable of length 3 (on the 14th position), for stressed syllables there are a couple of other syllables with a length greater than 2 (zio on position 6 with 34% stress ratio, gia on position 19 with 65% stress ratio). We investigate stress placement with regard to syllable structure and we provide in Table <ref type="table">4</ref> the percentages of words having the stress placed on different positions (for top 5), counting syllables from the beginning and from the end of the words as well. We observe that in most cases the stress is placed on the second to last syllable. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Index Syllable Frequency Stress ratio (%)</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>Stress placement for Italian.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.">Syllables' Usage</head><p>The syllables have a less intuitive behaviour, usually a small number of syllables cover a large part from a language. This is valuable for a large category of natural languages, including English, Dutch, Romanian <ref type="bibr" target="#b27">[28]</ref>, Korean, Chinese, etc. We investigate here if this empirical law is also applicable to Italian. We made this investigation both on stressed and general syllables.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.1.">General Syllables</head><p>The most frequent 30 Italian syllables (when stress placement is disregarded) cover almost 50% of #𝑇 𝑜𝑘𝑒𝑛 𝑠𝑦𝑙 , the most frequent 50 syllables cover 61%, the most frequent 100 cover 74% and the most frequent 150 syllables (i.e. 4% of #𝑇 𝑦𝑝𝑒 𝑠𝑦𝑙 ) cover 80% of #𝑇 𝑜𝑘𝑒𝑛 𝑠𝑦𝑙 . Over this number, the percentage of coverage rises slowly. 2,281 (61%) syllables of type syllables occur less then 10 times, and 1,174 syllables occur only once (hapax legomena).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.2.">Stressed Syllables</head><p>A similar trend can be observed also for the stressed syllables. Further, we notice that the most frequent syllables cover a wide ratio of the total syllable frequency. For example, the 10 most frequent stressed syllable represent 31% of the total of stressed syllables, the top 50 syllables, 60% and the top 200 syllables, 81% of the token syllables.</p><p>The values are plotted in Figure <ref type="figure" target="#fig_0">1</ref>, for all syllables and for stressed syllables. This results proves that the law is true for Italian too, a very small number of syllables cover a large part from Italian language (there are necessary only 150 syllables to cover 80% from language).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Minimum Effort Laws</head><p>In this section we discuss two minimum effort laws that have been previously investigated for other languages and verify whether they apply for Italian as well.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Chebanow</head><p>Denoting by 𝐹 (𝑛) the frequency of a word having n syllables and by 𝑖 = ∑︀ 𝑛𝐹 (𝑛)/ ∑︀ 𝐹 (𝑛) the average length (measured in syllables) of the words, Chebanow <ref type="bibr" target="#b28">[29]</ref> proposed the following law between the average 𝑖 and the probability of occurrences 𝑃 (𝑛) of the words having n syllables:</p><formula xml:id="formula_0">𝑃 (𝑛) = (𝑖 − 1) 𝑛−1 (𝑛 − 1)! * 𝑒 1−𝑖 (1)</formula><p>For Italian, 𝑖 = 4.226.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model</head><p>Hyphen Acc. Hyphen F1 Word Acc. GRU for syllabification w/o stress markers 99.74% 99.69% 97.61% GRU for syllabification w/ stress markers 99.82% 99.79% 98.41% GRU for stress prediction --94.45%</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 5</head><p>Performance metrics computed for the automatic syllabification and stress prediction on the test set. We computed accuracy and F1 scores on the sequence labelling predictions for syllabification, in order to assess how well the model predicts the positions where the syllables split. Word level metrics were computed for both syllabification and stress prediction; this kind of metrics are more strict since any misplaced hyphen in the syllabification makes the entire prediction wrong.</p><p>In Figures <ref type="figure" target="#fig_2">2a and 2b</ref> we plot the probability distribution of the length of words (in syllables) -the practical and theoretical representations.</p><p>We observe that the two curves have comparable shapes, with a more prominent peak for the probability distribution in Figure <ref type="figure" target="#fig_2">2a</ref>; this peak can be influenced by the fact that it is determined based on all the words in the dictionary, where many 4-syllable words are present.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Menzerath</head><p>Menzerath's law -later generalized by the Menzerath-Altmann law <ref type="bibr" target="#b29">[30]</ref> -states that the bigger the number of syllables in a word, the lesser the number of phonemes composing these syllables. In other words, Menzerath's law expresses a negative correlation between the length of a word in syllables and the lengths in phonemes of its constitutive syllables. In cognitive economy terms, this means that the more complex a linguistic construct, the smaller its constituents. The law is expressed as follows:</p><formula xml:id="formula_1">𝑦 = 𝛼𝑥 𝛽 𝑒 −𝛾𝑥 (<label>2</label></formula><formula xml:id="formula_2">)</formula><p>where 𝑦 is the syllable length (the size of the constituent), 𝑥 is the number of syllables per word (the size of the linguistic construct), and 𝛼, 𝛽, 𝛾 are empirical parameters. Figure <ref type="figure" target="#fig_2">2c</ref> shows that the law is satisfied for Italian.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Automatic Syllabification and Stress Assignment</head><p>We further investigate how a deep-learning model can automatically infer the syllabification and stress assignment of Italian words, given their orthographic representation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Methodology</head><p>Both tasks can be defined in terms of a sequence labelling problem, strategy which was previously successful used for Romanian <ref type="bibr" target="#b30">[31,</ref><ref type="bibr" target="#b31">32]</ref>. Let us consider, for example, the word medaglione (the Italian translation of the word "locket"). For syllabification we can label each letter from the word either with the label 1, denoting that a syllable starts from that letter, or with the label 0, meaning the respective letter is not the first letter in its syllable. Similarly, for identifying the stressed vowel, we can label its position with a 1 and all other letters are assigned the label 0. We thus obtain for our example the sequence 1010100010 for syllabification and the sequence 0000000100 for stress prediction (i.e. me-dagliò-ne, the o vowel is stressed). With these definitions, we can now construct machine learning models for labelling the character sequences. The model we propose is a recurrent neural network based on Gated Recurrent Units (GRU) <ref type="bibr" target="#b32">[33]</ref>. The model architecture is comprised from the following components:</p><p>• a character embedding layer, producing 64dimensional vectors for each unique character • a stacked bidirectional GRU, with 3 layers and a 128-dimensional hidden state; a 0.2-rate dropout applied after each of the first two layers • 0.5-rate dropout, after the last GRU layer, along with one-dimensional batch normalization • a time-distributed fully-connected layer with 256 output nodes and ReLU activation • a linear layer that projects the 256-dimensional vector into a single number, on which sigmoid activation is applied to infer the binary labels.</p><p>For training the models for both tasks, the dataset of words is split into 50% training examples and 50% test examples, unseen during training.</p><p>The loss function computed for the prediction made for a word, regardless of the task on which the model is trained, is the average of two terms: the first one is the average character-wise binary cross-entropy, while the second one is the root mean squared error computed between the vector of predicted labels and the groundtruth vector. The model is optimized using the Adam optimizer <ref type="bibr" target="#b33">[34]</ref>, with a learning rate of 0.0003, no weight decay, bath size of 32, and a LR scheduler that halves it every 5 epochs. The models are trained for 10-15 epochs.</p><p>For the task of automatic syllabification, we wanted to check if the presence of the stress markers affects the performance of the model. Because of that, we trained two models: the first one was trained using the spelling of the words with the stress markers removed, while the second one was trained with them included.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Stress Assignment Errors</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>True</head><p>Predicted bàlano balanò fèmore femòre dòlmen dolmèn tùtolo tutòlo pudìco pùdico corsìa còrsia</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Syllabification Errors</head><p>True Predicted mu-o-ne muo-ne bion-da bi-on-da cli-en-te clien-te co-di-a-to co-dia-to ma-nu-brio ma-nu-bri-o spa-tria-to spa-tri-a-to</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 6</head><p>Examples of erroneous test predictions provided by the deeplearning models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Results Anaysis</head><p>Table <ref type="table">5</ref> contains the metrics computed on the test set, using the models trained for syllabification (both with and without stress markers) and the model trained for predicting the stressed vowel. We obtained a remarkable hyphen accuracy of 99.74% for syllabification without the stress markers, and, when we add the stress markers, we obtained an increasing accuracy, obtaining 99.82%.</p><p>Including the stress markers into the data used for syllabification improved the metrics across the board, most notably with a ∼ 1% increase in word-level accuracy, which considering the large amount of data, and the high accuracy scores is a significant improvement (460 fewer syllabification mistakes as opposed to the approach that excludes stress markers). Regarding the stress prediction, we obtained an accuracy of 94.45%. Table <ref type="table">6</ref> showcases a series of wrong predictions generated by the models on the tests sets for stress assignment and syllabification.</p><p>We also look into the accuracy scores computed for the test set, when it is bucketed based on the real number of syllables of the test words. These results are shown in Figure <ref type="figure" target="#fig_3">3</ref> and Table <ref type="table" target="#tab_5">7</ref>. For stress assignment, accuracy decreases to a global minimum for disyllabic words, then starts to increase again with the number of syllables. For the syllabification task, including the stress markers seems to outperform excluding them in most scenarios, while both accuracies achieve a peak around the 5 syllables mark. This result seems to align with the distribution of syllables in the dataset, i.e. obtaining higher scores for the number of syllables with more examples. For stress assignment errors, we also investigate the placement of the predicted stressed syllable in relation with the true one (see Table <ref type="table" target="#tab_6">8</ref>). 95.6% of the errors misplaced the stressed syllable at most one position to the left, or to the right, while almost two thirds of the erroneous predictions placed the stress on the first syllable to the right of the correct one.  Starting from the incorrect predictions for stress assignment, we compute how far the assigned stress is from the actual one, in numbers of syllables (delta). A delta of −2 means that the predicted stressed syllable is the second one to the left of the correct stressed syllable. A delta of 0 in this situation means that the algorithm predicted the stressed vowel incorrectly, but the prediction sits inside the correct stressed syllable.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>In this paper we have investigated graphical syllabification and graphical stress assignment for Italian words.</p><p>We have started by building ItGraSyll, a dataset of Italian graphical syllabified words, with stress annotations as well, on which we have performed several quantitative and qualitative analyses, including the verification of two minimum effort laws for the case of Italian. Finally, we have proposed a recurrent neural network machine learning model for automatic syllabification and stress assignment for Italian written words. For stress prediction we have obtained 94.45% word-level accuracy, and for syllabification we have obtained 98.41% word-level accuracy and 99.82% hyphen-level accuracy. In future work we intend to extend the analysis from dictionary level to corpus level and to investigate other languages as well.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The coverage of most frequent syllables.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>( a )</head><label>a</label><figDesc>The probability distribution of the length of words. (b) Theoretical representation of the probability distribution of the length of words.(c) Menzerath's Law: The more syllables in a word, the smaller its syllables.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Minimum effort laws.</figDesc><graphic coords="5,89.29,95.52,129.18,84.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: The test accuracies for each of the three tasks, computed independently on the test words, bucketed by their true number of syllables.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>. Number of words per number of syllables.</figDesc><table><row><cell>#syll.</cell><cell>#words</cell><cell>Examples</cell></row><row><cell>1</cell><cell>722</cell><cell>ai</cell></row><row><cell>2</cell><cell>5,960</cell><cell>àc-cia</cell></row><row><cell>3</cell><cell>23,286</cell><cell>àb-ba-co</cell></row><row><cell>4</cell><cell>41,253</cell><cell>a-ba-chì-sta</cell></row><row><cell>5</cell><cell>28,357</cell><cell>a-bi-tà-co-lo</cell></row><row><cell>6</cell><cell>10,829</cell><cell>ac-cu-mu-la-zió-ne</cell></row><row><cell>7</cell><cell>3,294</cell><cell>au-ten-ti-fi-ca-zió-ne</cell></row><row><cell>8</cell><cell>650</cell><cell>a-e-ro-mo-del-lì-sti-co</cell></row><row><cell>9</cell><cell>132</cell><cell>bi-o-me-te-o-ro-lo-gì-a</cell></row><row><cell>10</cell><cell>16</cell><cell>in-tel-let-tu-a-li-sti-ca-mén-te</cell></row><row><cell>11</cell><cell>5</cell><cell>ge-ne-ra-ti-vo-tra-sfor-ma-zio-nà-le</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Syllable Frequency</head><label></label><figDesc></figDesc><table><row><cell>1</cell><cell>to</cell><cell>23943</cell></row><row><cell>2</cell><cell>re</cell><cell>18199</cell></row><row><cell>3</cell><cell>ta</cell><cell>12796</cell></row><row><cell>4</cell><cell>te</cell><cell>10987</cell></row><row><cell>5</cell><cell>si</cell><cell>10026</cell></row><row><cell>6</cell><cell>a</cell><cell>9142</cell></row><row><cell>7</cell><cell>co</cell><cell>8874</cell></row><row><cell>8</cell><cell>ri</cell><cell>8868</cell></row><row><cell>9</cell><cell>ca</cell><cell>8478</cell></row><row><cell>10</cell><cell>ra</cell><cell>8388</cell></row><row><cell>11</cell><cell>na</cell><cell>8367</cell></row><row><cell>12</cell><cell>ti</cell><cell>8184</cell></row><row><cell>13</cell><cell>ne</cell><cell>8112</cell></row><row><cell>14</cell><cell>men</cell><cell>7841</cell></row><row><cell>15</cell><cell>la</cell><cell>7175</cell></row><row><cell>16</cell><cell>di</cell><cell>6663</cell></row><row><cell>17</cell><cell>le</cell><cell>6555</cell></row><row><cell>18</cell><cell>li</cell><cell>6176</cell></row><row><cell>19</cell><cell>no</cell><cell>5748</cell></row><row><cell>20</cell><cell>lo</cell><cell>5479</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2</head><label>2</label><figDesc>Top 20 most frequent syllables.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3</head><label>3</label><figDesc>Top 20  most frequent stressed syllables. The stress ratio indicates how often out of all the occurrences of the syllable in the corpus it appears as stressed.</figDesc><table><row><cell>1</cell><cell>men</cell><cell cols="2">7120</cell><cell>90</cell></row><row><cell>2</cell><cell>ta</cell><cell cols="2">5809</cell><cell>45</cell></row><row><cell>3</cell><cell>na</cell><cell cols="2">3348</cell><cell>40</cell></row><row><cell>4</cell><cell>to</cell><cell cols="2">3254</cell><cell>15</cell></row><row><cell>5</cell><cell>la</cell><cell cols="2">2978</cell><cell>41</cell></row><row><cell>6</cell><cell>zio</cell><cell cols="2">2916</cell><cell>76</cell></row><row><cell>7</cell><cell>ti</cell><cell cols="2">2820</cell><cell>34</cell></row><row><cell>8</cell><cell>ca</cell><cell cols="2">2461</cell><cell>29</cell></row><row><cell>9</cell><cell>ra</cell><cell cols="2">2297</cell><cell>27</cell></row><row><cell>10</cell><cell>li</cell><cell cols="2">2239</cell><cell>36</cell></row><row><cell>11</cell><cell>ri</cell><cell cols="2">2100</cell><cell>24</cell></row><row><cell>12</cell><cell>tu</cell><cell cols="2">2024</cell><cell>62</cell></row><row><cell>13</cell><cell>za</cell><cell cols="2">2022</cell><cell>42</cell></row><row><cell>14</cell><cell>ni</cell><cell cols="2">1734</cell><cell>40</cell></row><row><cell>15</cell><cell>tri</cell><cell cols="2">1458</cell><cell>60</cell></row><row><cell>16</cell><cell>ma</cell><cell cols="2">1209</cell><cell>25</cell></row><row><cell>17</cell><cell>si</cell><cell cols="2">1144</cell><cell>11</cell></row><row><cell>18</cell><cell>da</cell><cell cols="2">1109</cell><cell>43</cell></row><row><cell>19</cell><cell>gia</cell><cell cols="2">1081</cell><cell>65</cell></row><row><cell>20</cell><cell>mi</cell><cell cols="2">1052</cell><cell>25</cell></row><row><cell cols="2">Syllable %words</cell><cell></cell><cell cols="2">Syllable %words</cell></row><row><cell>1 st</cell><cell>8,611</cell><cell></cell><cell>1 st</cell><cell>3,330</cell></row><row><cell>2 nd</cell><cell>25,544</cell><cell></cell><cell>2 nd</cell><cell>94,225</cell></row><row><cell>3 rd</cell><cell>40,568</cell><cell></cell><cell>3 rd</cell><cell>16,113</cell></row><row><cell>4 th</cell><cell>25,593</cell><cell></cell><cell>4 th</cell><cell>14</cell></row><row><cell>5 th</cell><cell>9,243</cell><cell></cell><cell>5 th</cell><cell>1</cell></row><row><cell cols="3">(a) counting syllables from</cell><cell cols="2">(b) counting syllables from</cell></row><row><cell cols="3">the beginning of the</cell><cell cols="2">the end of the word</cell></row><row><cell>word</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 7</head><label>7</label><figDesc>Similar to Figure3this table contains the actual values of the test accuracies for the three tasks: stress assignment, and syllabification with/without stress markers (SM) included. These scores are computed separately for words with the same number of syllables.</figDesc><table><row><cell>1</cell><cell>721</cell><cell>99.03%</cell><cell cols="2">83.63%</cell><cell>84.88%</cell></row><row><cell>2</cell><cell>5,960</cell><cell>92.94%</cell><cell cols="2">96.56%</cell><cell>97.80%</cell></row><row><cell>3</cell><cell>23,286</cell><cell>94.46%</cell><cell cols="2">98.55%</cell><cell>99.19%</cell></row><row><cell>4</cell><cell>41,253</cell><cell>97.42%</cell><cell cols="2">99.03%</cell><cell>99.48%</cell></row><row><cell>5</cell><cell>28,357</cell><cell>98.92%</cell><cell cols="2">99.33%</cell><cell>99.49%</cell></row><row><cell>6</cell><cell>10,829</cell><cell>99.48%</cell><cell cols="2">99.23%</cell><cell>99.26%</cell></row><row><cell>7</cell><cell>3,294</cell><cell>99.67%</cell><cell cols="2">99.15%</cell><cell>99.15%</cell></row><row><cell>8</cell><cell>650</cell><cell>100.0%</cell><cell cols="2">99.23%</cell><cell>98.46%</cell></row><row><cell>9</cell><cell>132</cell><cell>100.0%</cell><cell cols="2">99.24%</cell><cell>99.24%</cell></row><row><cell>10</cell><cell>16</cell><cell>100.0%</cell><cell cols="2">93.75%</cell><cell>93.75%</cell></row><row><cell>11</cell><cell>5</cell><cell>100.0%</cell><cell cols="2">100.0%</cell><cell>100.0%</cell></row><row><cell></cell><cell cols="4">Stressed Syllable Delta Num. Errors Pct. Errors</cell></row><row><cell></cell><cell>-2</cell><cell></cell><cell>21</cell><cell>0.74%</cell></row><row><cell></cell><cell>-1</cell><cell></cell><cell>804</cell><cell>28.38%</cell></row><row><cell></cell><cell>0</cell><cell></cell><cell>95</cell><cell>3.35%</cell></row><row><cell></cell><cell>1</cell><cell></cell><cell>1,809</cell><cell>63.85%</cell></row><row><cell></cell><cell>2</cell><cell></cell><cell>102</cell><cell>3.60%</cell></row><row><cell></cell><cell>3</cell><cell></cell><cell>2</cell><cell>0.07%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 8</head><label>8</label><figDesc></figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">The dataset is available for research purposes upon request at: https://nlp.unibuc.ro/resources.html#itgrasyll</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://dizionario.internazionale.it/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>We want to thank the reviewers for their useful suggestions. Research supported by the Ministry of Research, Innovation and Digitization, CNCS/CCCDI UEFISCDI, SiRoLa project, number PN-IV-P1-PCE-2023-1701, Romania.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Incorporating syllabification points into a model of grapheme-to-phoneme conversion</title>
		<author>
			<persName><forename type="first">S</forename><surname>Suyanto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ternational Journal of Speech Technology</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="459" to="470" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Exploring emergent syllables in end-toend automatic speech recognizers through model explainability technique</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">N</forename><surname>Vitale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Cutugno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Origlia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Coro</surname></persName>
		</author>
		<idno type="DOI">10.1007/S00521-024-09435-1</idno>
		<ptr target="https://doi.org/10.1007/s00521-024-09435-1.doi:10.1007/S00521-024-09435-1" />
	</analytic>
	<monogr>
		<title level="j">Neural Comput. Appl</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="6875" to="6901" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">On the syllabic similarities of romance languages</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dinu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-540-30586-6_88</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-540-30586-6\_88" />
	</analytic>
	<monogr>
		<title level="m">Computational Linguistics and Intelligent Text Processing, 6th International Conference, CI-CLing 2005</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">F</forename><surname>Gelbukh</surname></persName>
		</editor>
		<meeting><address><addrLine>Mexico City, Mexico</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2005">February 13-19, 2005. 2005</date>
			<biblScope unit="volume">3406</biblScope>
			<biblScope unit="page" from="785" to="788" />
		</imprint>
	</monogr>
	<note>Proceedings</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease formula) for Navy enlisted personnel</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">P</forename><surname>Kincaid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">R P F</forename><genName>Jr</genName></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Rogers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">S</forename><surname>Chissom</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Research Branch Report</title>
				<meeting><address><addrLine>Millington, TN</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1975">1975</date>
		</imprint>
		<respStmt>
			<orgName>Chief of Naval Training</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Automated Metric Analysis of Spanish Poetry: Two Complementary Approaches</title>
		<author>
			<persName><forename type="first">G</forename><surname>Marco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>De La Rosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gonzalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>González-Blanco</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="51734" to="51746" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">On the romanian rhyme detection</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Ciobanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of COLING 2012: Demonstration Papers</title>
				<meeting>COLING 2012: Demonstration Papers</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="87" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The syllable as a structural unit</title>
		<author>
			<persName><forename type="first">L</forename><surname>Hjelmslev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">the Proceedings of the 3rd International Congress of Phonetic Sciences</title>
				<meeting><address><addrLine>Ghent)</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1938">1938. 1938</date>
			<biblScope unit="volume">266</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">De</forename><surname>Saussure</surname></persName>
		</author>
		<title level="m">Course in general linguistics</title>
				<imprint>
			<publisher>Columbia University Press</publisher>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">The Notion of Syllable across History, Theories and Analysis</title>
		<author>
			<persName><forename type="first">D</forename><surname>Russo</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>Cambridge Scholars Publishing</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Syllable, segment and prosody</title>
		<author>
			<persName><forename type="first">M</forename><surname>Loporcaro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Cambridge history of the Romance languages</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="50" to="108" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><surname>Meyer-Lübke</surname></persName>
		</author>
		<title level="m">Grammaire des langues romanes</title>
				<imprint>
			<publisher>H. Welter</publisher>
			<date type="published" when="1906">1906</date>
			<biblScope unit="volume">4</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">M.-D</forename><surname>Glessgen</surname></persName>
		</author>
		<title level="m">Linguistique romane: domaines et méthodes en linguistique française et romane</title>
				<imprint>
			<publisher>Armand Colin</publisher>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">S</forename><surname>Miret</surname></persName>
		</author>
		<title level="m">Fonética histórica, in: Manual de lingüística románica</title>
				<meeting><address><addrLine>Ariel España</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="227" to="250" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Grammatica storica della lingua e dei dialetti italiani</title>
		<author>
			<persName><forename type="first">F</forename><surname>Ovidio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Meyer-Lübke</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1906">1906</date>
			<biblScope unit="volume">368</biblScope>
			<pubPlace>U. Hoepli</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Rohlfs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Franceschi</surname></persName>
		</author>
		<title level="m">Grammatica storica della lingua italiana e dei suoi dialetti: Morfologia</title>
				<imprint>
			<date type="published" when="1968">1968</date>
		</imprint>
	</monogr>
	<note>No Title</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">A generic tool for the automatic syllabification of italian, A generic tool for the automatic syllabification of Italian</title>
		<author>
			<persName><forename type="first">B</forename><surname>Bigi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Petrone</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="73" to="77" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Are Rule-based Syllabification Methods Adequate for Languages with Low Syllabic Complexity? The Case of Italian</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">R</forename><surname>Adsett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Marchand</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Sixth ISCA Workshop on Speech Synthesis</title>
				<editor>
			<persName><forename type="first">P</forename><surname>Wagner</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Abresch</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Breuer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">W</forename><surname>Hess</surname></persName>
		</editor>
		<meeting><address><addrLine>Bonn, Germany</addrLine></address></meeting>
		<imprint>
			<publisher>ISCA</publisher>
			<date type="published" when="2007">August 22-24, 2007. 2007</date>
			<biblScope unit="page" from="58" to="63" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Syllabification rules versus data-driven methods in a language with low syllabic complexity: The case of italian</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">R</forename><surname>Adsett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Marchand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Keselj</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.csl.2009.02.004</idno>
		<ptr target="https://doi.org/10.1016/j.csl.2009.02.004.doi:10.1016/j.csl.2009.02.004" />
	</analytic>
	<monogr>
		<title level="j">Comput. Speech Lang</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="page" from="444" to="463" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Automatic syllabification using segmental conditional random fields</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">A</forename><surname>Rogova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Demuynck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">V</forename><surname>Compernolle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics in the Netherlands Journal</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="34" to="48" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Romanian syllabication using machine learning</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Niculae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Sulea</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Text, Speech, and Dialogue -16th International Conference, TSD 2013</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">I</forename><surname>Habernal</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Matousek</surname></persName>
		</editor>
		<meeting><address><addrLine>Pilsen, Czech Republic</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">September 1-5, 2013. 2013</date>
			<biblScope unit="volume">8082</biblScope>
			<biblScope unit="page" from="450" to="456" />
		</imprint>
	</monogr>
	<note>Proceedings</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Language-Agnostic Syllabification with Neural Sequence Labeling</title>
		<author>
			<persName><forename type="first">J</forename><surname>Krantz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">W</forename><surname>Dulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Palma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">18th IEEE International Conference On Machine Learning And Applications (ICMLA)</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
			<biblScope unit="page" from="804" to="810" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">On incrementing interpretability of machine learning models from the foundations: A study on syllabic speech units</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">N</forename><surname>Vitale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Schettino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Cutugno</surname></persName>
		</author>
		<ptr target="https://ceur-ws.org/Vol-3596/paper51.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 9th Italian Conference on Computational Linguistics</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">F</forename><surname>Boschetti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Lebani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Magnini</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</editor>
		<meeting>the 9th Italian Conference on Computational Linguistics<address><addrLine>Venice, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023-12-02">November 30 -December 2, 2023. 2023</date>
			<biblScope unit="volume">3596</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Full inflection learning using deep neural networks</title>
		<author>
			<persName><forename type="first">O</forename><surname>Sulea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dumitru</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-23793-5_33</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-031-23793-5\_33" />
	</analytic>
	<monogr>
		<title level="m">Computational Linguistics and Intelligent Text Processing -19th International Conference, CICLing 2018</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">A</forename><forename type="middle">F</forename><surname>Gelbukh</surname></persName>
		</editor>
		<meeting><address><addrLine>Hanoi, Vietnam</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018">March 18-24, 2018. 2018</date>
			<biblScope unit="volume">13396</biblScope>
			<biblScope unit="page" from="408" to="415" />
		</imprint>
	</monogr>
	<note>Revised Selected Papers, Part I</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">A syllable segmentation algorithm for english and italian</title>
		<author>
			<persName><forename type="first">M</forename><surname>Petrillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Cutugno</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">INTERSPEECH 2003</title>
				<imprint>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="2913" to="2916" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">A Ranking Approach to Stress Prediction for Letter-to-Phoneme Conversion</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Dou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bergsma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jiampojamarn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kondrak</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 -Volume 1, ACL &apos;09, Association for Computational Linguistics</title>
				<meeting>the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 -Volume 1, ACL &apos;09, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="118" to="126" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">An approach to syllables via some extensions of marcus contextual grammars</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<idno type="DOI">10.1023/A:1024089129146</idno>
		<ptr target="https://doi.org/10.1023/A:1024089129146.doi:10.1023/A:1024089129146" />
	</analytic>
	<monogr>
		<title level="j">Grammars</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="1" to="12" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">On the data base of romanian syllables and some of its quantitative and cryptographic aspects</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dinu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC 2006</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Calzolari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Choukri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Gangemi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Tapias</surname></persName>
		</editor>
		<meeting>the Fifth International Conference on Language Resources and Evaluation, LREC 2006<address><addrLine>Genoa, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2006">May 22-28, 2006. 2006</date>
			<biblScope unit="page" from="1795" to="1798" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">On the behavior of romanian syllables related to minimum effort laws</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dinu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings Workshop Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages, co-located with RANLP 2009</title>
				<meeting>Workshop Multilingual Resources, Technologies and Evaluation for Central and Eastern European Languages, co-located with RANLP 2009<address><addrLine>Borovets, Bulgaria</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006">2006. 2009</date>
			<biblScope unit="page" from="9" to="13" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">On conformity of language structures within the Indoeuropean family to poisson&apos;s law</title>
		<author>
			<persName><forename type="first">S</forename><surname>Chebanow</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Comptes rendus de l&apos;Academie de science de l&apos;URSS</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page" from="99" to="102" />
			<date type="published" when="1947">1947</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Prolegomena to Menzerath&apos;s Law</title>
		<author>
			<persName><forename type="first">G</forename><surname>Altmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Glottometrika</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1" to="10" />
			<date type="published" when="1980">1980</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Predicting romanian stress assignment</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Ciobanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dinu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<idno type="DOI">10.3115/V1/E14-4013</idno>
		<ptr target="https://doi.org/10.3115/v1/e14-4013.doi:10.3115/V1/E14-4013" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Bouma</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Parmentier</surname></persName>
		</editor>
		<meeting>the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014<address><addrLine>Gothenburg, Sweden</addrLine></address></meeting>
		<imprint>
			<publisher>The Association for Computer Linguistics</publisher>
			<date type="published" when="2014">April 26-30, 2014. 2014</date>
			<biblScope unit="page" from="64" to="68" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Using a machine learning model to assess the complexity of stress systems</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Ciobanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Chitoran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Niculae</surname></persName>
		</author>
		<ptr target="http://www.lrec-conf.org/proceedings/lrec2014/summaries/1200.html" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014</title>
				<editor>
			<persName><forename type="first">N</forename><surname>Calzolari</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Choukri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Declerck</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Loftsson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Moreno</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Piperidis</surname></persName>
		</editor>
		<meeting>the Ninth International Conference on Language Resources and Evaluation, LREC 2014<address><addrLine>Reykjavik, Iceland</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2014">May 26-31, 2014. 2014</date>
			<biblScope unit="page" from="331" to="336" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Merriënboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gulcehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bougares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1406.1078</idno>
		<title level="m">Learning phrase representations using rnn encoderdecoder for statistical machine translation</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6980</idno>
		<title level="m">Adam: A method for stochastic optimization</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
