<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ItGraSyll: A Computational Analysis of Graphical Syllabification and Stress Assignment in Italian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Liviu P. Dinu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bogdan Iordache</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bianca Guita</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simona Georgescu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alina Cristea</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Human Language Technologies Research Center</institution>
          ,
          <addr-line>Bucharest</addr-line>
          ,
          <country country="RO">Romania</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Bucharest, Faculty of Foreign Languages and Literatures</institution>
          ,
          <country country="RO">Romania</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Bucharest, Faculty of Mathematics and Computer Science</institution>
          ,
          <country country="RO">Romania</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we build a dataset of Italian graphical syllables (called ItGraSyll). We perform quantitative and qualitative analyses on the syllabification and stress assignment in Italian. We propose a machine learning model, based on deep-learning techniques, for automatically inferring syllabification and stress assignment. For stress prediction we report 94.45% word-level accuracy, and for syllabification we report 98.41% word-level accuracy and 99.82% hyphen-level accuracy. “prosodic revolution” [10] from Latin to the Romance languages - including syncope (the loss of an intermediate Word syllabification and syllable analysis are two related syllable) and apocope (the loss of the final syllable) at a issues of great importance in the study of language (writ- large scale - has led to major changes, but their weight is ten or spoken). These topics have attracted a large cat- diferent from one idiom to another: while the Western egory of researchers, from pure linguists, in phonetics, Romance languages manifest highly evident diferences to psycholinguists, computer scientists, speech thera- from the Latin phonological and prosodic system, and the pists, etc. Thus, the syllable plays an important role in Eastern languages are considered to be most conservative language learning and acquisition, speech recognition, from this point of view, Italian seems to be in between speech production [1, 2], language similarity [3], in text [10]. On the other hand, in Latin, the relation between comprehensibility (Kincaid-Flesch formula [4]), in speech stress and quantity grew stronger, thus short stressed therapy, in poetry analysis [5, 6], etc. Each language has vowels progressively gained length. It is noteworthy that its own way of grouping sounds into syllables and its own this situation is best preserved in Italian, and not in the rules for dividing words into syllables. Linguistically, the Eastern Romance idioms: thus, in Italian stress cannot syllable represents "the smallest phonetic trance likely skip a heavy penultimate syllable, and stress cannot fall to receive an accent and only one" [7], and the syllabic further back than the antepenultimate syllable, a twofold cut is seen by De Saussure [8] on the border between the characteristic feature of the Latin prosodic system. This implosion and the explosion of the spoken sound: "If in is why we are taking Italian as a starting point for a largera chain of sounds one goes from implosion to explosion, scale study, oriented towards all Romance languages. The one obtains a particular efect which is the indication of main diference between Latin and its modern descenthe boundary of the syllable". dants is that Latin stress was quantity- sensitive, leading The analysis of the words' syllabic structure also plays thus to the following rule: in polysyllabic words, stress an important part in historical linguistics [9], not only fell on a heavy penultimate (meaning, containing a long in diachronic phonetics and phonology, but also in lexi- vowel), otherwise on the antepenultimate. Due to the cology. Romance comparative linguistics, in particular, collapse of vowel quantity as a distinctive feature in the still needs a detailed overview of this aspect, as syllable, vocalic system, no Romance language has retained the segmentation and prosody can give strong account on Latin stress rule as such [10]. As, from a statistic point of phonetic changes that haven't been explained yet. The view, the greatest part of the Romance lexicon is represented by penultimate stressed words, a basic automatic mechanism would assign penultimate stress by default, whereas for both final and antepenultimate stress, the machine (as well as, not in a few cases, non-native speakers) would need further specification. As a consequence of the loss of Latin vowel quantity, Romance stress has ceased to be completely predictable. That is, partially,</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;syllabification</kwd>
        <kwd>stress assignment</kwd>
        <kwd>Italian</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CLiC-it 2024: Tenth Italian Conference on Computational Linguistics,
Dec 04 — 06, 2024, Pisa, Italy
* Corresponding author.
$ ldinu@fmi.unibuc.ro (L. P. Dinu);
iordache.bogdan1998@gmail.com (B. Iordache);
bianca.guita@s.unibuc.ro (B. Guita);
simona.georgescu@lls.unibuc.ro (S. Georgescu);
alinaciobanu20@gmail.com (A. Cristea)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License why in the majority of the traditional Romance
comparaAttribution 4.0 International (CC BY 4.0).
tive or historical grammars, there is no specific section other linguistic factors that those rules take into account.
devoted to syllabification [ 11], or, if there is, it focuses For example, a rule that is present in many languages
either on general prosodic features [12], or on the vowel distinguishes between a vowel and a semivowel, but the
evolution depending on its presence in an open or closed computer is not able to easily recognize when the same
syllable [13]. The lack of a section dedicated to syllab- sign has the value of a vowel and when it is a semivowel.
ification is also common in the historical grammars of Because of this, rule-based adaptations of syllabification
Italian [14, 11, 15]. We will focus in this research only systems [26] generally have higher errors, and many
lanon written form of words, so we will investigate only guages do not have an automatic syllabification system
the graphical syllabification and stress. By focusing on yet (for example, in the Python library, only a few
lanthe graphical syllabification and stress in Italian, we aim guages have syllabification). The last few decades have
to take a step forward towards the complete evaluation brought the first data-driven syllabification systems.
of the prosodic changes that took place in the transition However, in order to build such a system, training
from Latin to the Romance languages, and their influence data is needed, and there are many cases in which the
on the Romance phonetics and phonology. A machine- available data do not cover the whole language, and thus
learning model, capable of automatically inferring graph- the systems have diferent results when the test corpus
ical syllabification and stress assignment, along with the is changed.
purpose of creating a data-base containing the quanti- Starting with these remarks, our main contributions
tative and qualitative description of syllabification and are:
stress in the Romance languages, could be the first
important task in the greater challenge of tracing the simi- • We propose ItGraSyll (Italian graphical syllables),
larities and diferences between the Romance languages a dataset of 114, 503 Italian words, in
orthoand, more important, between Romance and Latin. From graphic form, containing annotations for their
ora typological point of view, the study of syllabification thographic syllabification and stress placement 1
and stress can shed a new light on the universal features • We perform quantitative and qualitative analyses
that, by defining our phonoarticulatory and phonoacous- of the previously built dataset.
tic apparatus, have guided the languages’ development • We analyze stress placement in the context of the
and change. Given the promising results of this analysis, Italian syllables.
the present study can establish the basis of a research of • We propose an automatic system of syllabification
the syllable in other languages, either linguistically or for Italian words.
typologically related to Italian.</p>
      <p>
        One of the studies that address automatic
syllabification in Italian belongs to Bigi and Petrone [16], who 2. Quantitative Analysis
proposed a tool that performs rule-based automatic
segmentation. Adsett and Marchand [17] and Adsett et al. In this section we perform various measurements
regard[18] investigated whether data-driven approaches out- ing the syllables and stress placement of Italian written
perform rule-based approaches for a language with a words and analyze the results. We perform, on Italian,
low syllabic complexity, such as Italian. The authors an investigation similar to a previous investigations
conreached the conclusion that even in this case data-driven ducted on Romanian by Dinu and Dinu [27], Dinu and
systems are the more appropriate approach. In terms of Dinu [
        <xref ref-type="bibr" rid="ref4">28</xref>
        ].
machine learning, the tasks of automatically inferring
syllable boundaries and predicting stress assignment can be 2.1. Data
naturally framed as sequence labeling problems. While
automatic syllabification has received more attention re- We build a dataset of Italian words starting from the
cently [19, 20, 21, 22, 23, 24], stress placement has not online version of Dizionario italiano De Mauro,2 which
been investigated as much [25]. provides information regarding graphical syllabification
      </p>
      <p>Given the complexity of syllable applications and word and stress placement for the Italian vocabulary. Stressed
syllabification, the presence of electronic resources dedi- syllables are also shown by having accents on the
domicated to them becomes a necessity. While native speakers nant vowel. Going further, this dataset will be referred
of a language generally do not have great dificulty in to as ItGraSyll.
spelling words, the same cannot be said of those who We performed several pre-processing steps. We
learn a foreign language who often tend to apply their cleaned the resulted dataset by removing duplicates,
preown rules to foreign words, and problems arise in au- fixes and sufixes in order to remain with the base word;
tomatic syllabification. This is because the rules of
syllabification are linguistic rules, and they cannot always 1Ththtepsd:/a/tnalspe.utnisibauvc.ariola/rbelseofuorrcerse.shetamrlc#hitgpruarspyollses upon request at:
be easily modeled by the computer when there are no 2https://dizionario.internazionale.it/
abbreviations and unwanted punctuation marks such
as dots, commas, apostrophes and dashes were also
excluded so we can correctly process each word and its
syllable division. Finally, the dataset consists of 114, 503
words in orthographic form having between one and
eleven syllables. The distribution of words per number
of syllables is represented in Table 1.</p>
      <p>#syll.</p>
      <p>#words</p>
      <p>Examples</p>
      <sec id="sec-1-1">
        <title>2.2. Syllables</title>
      </sec>
      <sec id="sec-1-2">
        <title>2.3. Syllable Structure</title>
        <sec id="sec-1-2-1">
          <title>We identified a total of 67 diferent consonant-vowel structures. The most frequent 7 structures cover almost 97% of the total. Depending on the type-token ratio,</title>
          <p>the most frequent consonant-vowel structures are the
following: a) for the type syllables: cvc (25%), ccvc (20.9%),
cvvc (7.79%). b) for the token syllables: cv (58%), cvc (15%),
ccv (7%), cvv (4.74%) and v (4.32%). Moreover, we observe
that the cv structure corresponds to 40 out of the most
frequent 50 syllables from the dataset.</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>2.4. Stress Placement</title>
        <sec id="sec-1-3-1">
          <title>We identified a total of 2,883 stressed syllables (type syl</title>
          <p>lables). So, 847 syllables are never stressed. The most
frequent 20 stressed syllables are represented in Table 3.
We observe that the most frequent stressed syllable (men)
has a very high stress ratio (90%) when we compare the
stressed occurrences with all its occurrences (stressed
and unstressed) in our database. While in the top 20 of
all syllables, men is the only syllable of length 3 (on the
14th position), for stressed syllables there are a couple
of other syllables with a length greater than 2 (zio on
position 6 with 34% stress ratio, gia on position 19 with
65% stress ratio).</p>
          <p>We investigate stress placement with regard to syllable
structure and we provide in Table 4 the percentages of
words having the stress placed on diferent positions (for
top 5), counting syllables from the beginning and from
the end of the words as well. We observe that in most
cases the stress is placed on the second to last syllable.</p>
          <p>Stress ratio (%)
2.5. Syllables’ Usage
100 cover 74% and the most frequent 150 syllables (i.e.
4% of # ) cover 80% of # . Over this
number, the percentage of coverage rises slowly. 2,281
(61%) syllables of type syllables occur less then 10 times,
and 1,174 syllables occur only once (hapax legomena).
2.5.2. Stressed Syllables</p>
        </sec>
        <sec id="sec-1-3-2">
          <title>A similar trend can be observed also for the stressed syl</title>
          <p>lables. Further, we notice that the most frequent syllables
cover a wide ratio of the total syllable frequency. For
example, the 10 most frequent stressed syllable represent
31% of the total of stressed syllables, the top 50 syllables,
60% and the top 200 syllables, 81% of the token syllables.
The values are plotted in Figure 1, for all syllables and
for stressed syllables.</p>
          <p>0.8
0.7
e0.6
g
a
rve
oC0.5
0.4
0.3</p>
          <p>Type
al syl ables
stressed syl ables
25
50
75Number of syl a1b2le5s 150 175 200</p>
          <p>100</p>
        </sec>
        <sec id="sec-1-3-3">
          <title>This results proves that the law is true for Italian too,</title>
          <p>a very small number of syllables cover a large part from
Italian language (there are necessary only 150 syllables
to cover 80% from language).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Minimum Efort Laws</title>
      <sec id="sec-2-1">
        <title>In this section we discuss two minimum efort laws that have been previously investigated for other languages and verify whether they apply for Italian as well.</title>
        <p>
          2.5.1. General Syllables
The syllables have a less intuitive behaviour, usually a
small number of syllables cover a large part from a lan- 3.1. Chebanow
guage. This is valuable for a large category of natural Denoting by  () the frequency of a word having n
languages, including English, Dutch, Romanian [
          <xref ref-type="bibr" rid="ref4">28</xref>
          ], Ko- syllables and by  = ∑︀  ()/ ∑︀  () the average
rean, Chinese, etc. We investigate here if this empirical length (measured in syllables) of the words, Chebanow
law is also applicable to Italian. We made this investiga- [
          <xref ref-type="bibr" rid="ref5">29</xref>
          ] proposed the following law between the average 
tion both on stressed and general syllables. and the probability of occurrences  () of the words
having n syllables:
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>The most frequent 30 Italian syllables (when stress place</title>
        <p>ment is disregarded) cover almost 50% of # , the
most frequent 50 syllables cover 61%, the most frequent
For Italian,  = 4.226.</p>
        <p>() =
( − 1)− 1
( − 1)! * 1− 
(1)
(a) The probability distribution of the</p>
        <p>length of words.</p>
      </sec>
      <sec id="sec-2-3">
        <title>In Figures 2a and 2b we plot the probability distribution of the length of words (in syllables) – the practical and theoretical representations.</title>
        <p>We observe that the two curves have comparable We further investigate how a deep-learning model can
aushapes, with a more prominent peak for the probabil- tomatically infer the syllabification and stress assignment
ity distribution in Figure 2a; this peak can be influenced of Italian words, given their orthographic representation.
by the fact that it is determined based on all the words in
the dictionary, where many 4-syllable words are present.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Automatic Syllabification and</title>
    </sec>
    <sec id="sec-4">
      <title>Stress Assignment</title>
      <sec id="sec-4-1">
        <title>4.1. Methodology</title>
        <p>
          3.2. Menzerath Both tasks can be defined in terms of a sequence
labelling problem, strategy which was previously
successMenzerath’s law – later generalized by the Menzerath- ful used for Romanian[
          <xref ref-type="bibr" rid="ref7 ref8">31, 32</xref>
          ]. Let us consider, for
exAltmann law [
          <xref ref-type="bibr" rid="ref6">30</xref>
          ] – states that the bigger the number of ample, the word medaglione (the Italian translation of
syllables in a word, the lesser the number of phonemes the word "locket"). For syllabification we can label each
composing these syllables. In other words, Menzerath’s letter from the word either with the label 1, denoting
law expresses a negative correlation between the length that a syllable starts from that letter, or with the label
of a word in syllables and the lengths in phonemes of its 0, meaning the respective letter is not the first letter in
constitutive syllables. In cognitive economy terms, this its syllable. Similarly, for identifying the stressed vowel,
means that the more complex a linguistic construct, the we can label its position with a 1 and all other letters
smaller its constituents. The law is expressed as follows: are assigned the label 0. We thus obtain for our
exam =   −  (2) speleq utheencseeq0u0e0n0c0e0100110001f0o0r0s1t0refsosrpsryelldaibcitficioatnio(ni.ea.nmd
et-hdeawhere  is the syllable length (the size of the constituent), gliò-ne, the o vowel is stressed).
 is the number of syllables per word (the size of the lin- With these definitions, we can now construct machine
guistic construct), and , ,  are empirical parameters. learning models for labelling the character sequences.
Figure 2c shows that the law is satisfied for Italian. The model we propose is a recurrent neural network
based on Gated Recurrent Units (GRU) [
          <xref ref-type="bibr" rid="ref9">33</xref>
          ]. The model
architecture is comprised from the following components:
• a character embedding layer, producing 64- 4.2. Results Anaysis
dimensional vectors for each unique character
        </p>
        <p>Table 5 contains the metrics computed on the test set,
• a stacked bidirectional GRU, with 3 layers and a using the models trained for syllabification (both with
128-dimensional hidden state; a 0.2-rate dropout and without stress markers) and the model trained for
applied after each of the first two layers predicting the stressed vowel. We obtained a remarkable
• 0.5-rate dropout, after the last GRU layer, along hyphen accuracy of 99.74% for syllabification without
with one-dimensional batch normalization the stress markers, and, when we add the stress markers,
• a time-distributed fully-connected layer with 256 we obtained an increasing accuracy, obtaining 99.82%.</p>
        <p>output nodes and ReLU activation Including the stress markers into the data used for
syl• a linear layer that projects the 256-dimensional labification improved the metrics across the board, most
vector into a single number, on which sigmoid notably with a ∼ 1% increase in word-level accuracy,
activation is applied to infer the binary labels. which considering the large amount of data, and the high
accuracy scores is a significant improvement ( 460 fewer</p>
        <p>For training the models for both tasks, the dataset of syllabification mistakes as opposed to the approach that
words is split into 50% training examples and 50% test excludes stress markers). Regarding the stress prediction,
examples, unseen during training. we obtained an accuracy of 94.45%. Table 6 showcases a</p>
        <p>
          The loss function computed for the prediction made series of wrong predictions generated by the models on
for a word, regardless of the task on which the model the tests sets for stress assignment and syllabification.
is trained, is the average of two terms: the first one is We also look into the accuracy scores computed for
the average character-wise binary cross-entropy, while the test set, when it is bucketed based on the real number
the second one is the root mean squared error computed of syllables of the test words. These results are shown
between the vector of predicted labels and the ground- in Figure 3 and Table 7. For stress assignment,
accutruth vector. The model is optimized using the Adam racy decreases to a global minimum for disyllabic words,
optimizer [
          <xref ref-type="bibr" rid="ref10">34</xref>
          ], with a learning rate of 0.0003, no weight then starts to increase again with the number of syllables.
decay, bath size of 32, and a LR scheduler that halves it For the syllabification task, including the stress markers
every 5 epochs. The models are trained for 10-15 epochs. seems to outperform excluding them in most scenarios,
        </p>
        <p>For the task of automatic syllabification, we wanted while both accuracies achieve a peak around the 5
syllato check if the presence of the stress markers afects the bles mark. This result seems to align with the distribution
performance of the model. Because of that, we trained of syllables in the dataset, i.e. obtaining higher scores
two models: the first one was trained using the spelling for the number of syllables with more examples. For
of the words with the stress markers removed, while the stress assignment errors, we also investigate the
placesecond one was trained with them included. ment of the predicted stressed syllable in relation with
Stress Assignment Errors tthhee tsrtureesosnede (ssyelelaTbaleblaet8m). o9s5t.6o%neopfotshietioernrotorstmheislpelfatc,eodr
to the right, while almost two thirds of the erroneous
predictions placed the stress on the first syllable to the
right of the correct one.</p>
        <p>Predicted
balanò
femòre
dolmèn
tutòlo
pùdico
còrsia
True
bàlano
fèmore
dòlmen
tùtolo
pudìco
corsìa
Syllabification Errors</p>
        <p>True
mu-o-ne
bion-da
cli-en-te
co-di-a-to
ma-nu-brio
spa-tria-to</p>
        <p>Predicted
muo-ne
bi-on-da
clien-te
co-dia-to
ma-nu-bri-o
spa-tri-a-to</p>
        <p>Syllabification (w/o SM)</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <sec id="sec-5-1">
        <title>In this paper we have investigated graphical syllabifica</title>
        <p>tion and graphical stress assignment for Italian words.
We have started by building ItGraSyll, a dataset of Italian
graphical syllabified words, with stress annotations as
well, on which we have performed several quantitative
and qualitative analyses, including the verification of
two minimum efort laws for the case of Italian. Finally,
we have proposed a recurrent neural network machine
learning model for automatic syllabification and stress
assignment for Italian written words. For stress
prediction we have obtained 94.45% word-level accuracy, and
for syllabification we have obtained 98.41% word-level
accuracy and 99.82% hyphen-level accuracy. In future
work we intend to extend the analysis from dictionary
level to corpus level and to investigate other languages
as well.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <sec id="sec-6-1">
        <title>We want to thank the reviewers for their useful suggestions. Research supported by the Ministry of Research,</title>
      </sec>
      <sec id="sec-6-2">
        <title>Innovation and Digitization, CNCS/CCCDI UEFISCDI, SiRoLa project, number PN-IV-P1-PCE-2023-1701, Romania.</title>
        <p>in Computer Science, Springer, 2005, pp. 785–788. 463. URL: https://doi.org/10.1016/j.csl.2009.02.004.
URL: https://doi.org/10.1007/978-3-540-30586-6_88. doi:10.1016/j.csl.2009.02.004.
doi:10.1007/978-3-540-30586-6\_88. [19] K. A. Rogova, K. Demuynck, D. V. Compernolle,
Au[4] J. P. Kincaid, L. R. P. F. Jr., R. L. Rogers, B. S. Chissom, tomatic syllabification using segmental conditional
Derivation of new readability formulas (Automated random fields, in: Computational Linguistics in the
Readability Index, Fog Count and Flesch Reading Netherlands Journal, volume 3, 2013, pp. 34–48.
Ease formula) for Navy enlisted personnel, Re- [20] L. P. Dinu, V. Niculae, O. Sulea, Romanian
syllabsearch Branch Report, Millington, TN: Chief of ication using machine learning, in: I. Habernal,
Naval Training, 1975. V. Matousek (Eds.), Text, Speech, and Dialogue
[5] G. Marco, J. de la Rosa, J. Gonzalo, S. Ros, 16th International Conference, TSD 2013, Pilsen,
E. González-Blanco, Automated Metric Analysis of Czech Republic, September 1-5, 2013. Proceedings,
Spanish Poetry: Two Complementary Approaches, volume 8082 of Lecture Notes in Computer Science,
IEEE Access 9 (2021) 51734–51746. Springer, 2013, pp. 450–456.
[6] A. M. Ciobanu, L. P. Dinu, On the romanian [21] J. Krantz, M. W. Dulin, P. D. Palma,
Languagerhyme detection, in: Proceedings of COLING 2012: Agnostic Syllabification with Neural Sequence
LaDemonstration Papers, 2012, pp. 87–94. beling, 2019 18th IEEE International Conference
[7] L. Hjelmslev, The syllable as a structural unit, in: On Machine Learning And Applications (ICMLA)
the Proceedings of the 3rd International Congress (2019) 804–810.
of Phonetic Sciences (Ghent), 1938, volume 266, [22] V. N. Vitale, L. Schettino, F. Cutugno, On
incre1938. menting interpretability of machine learning
mod[8] F. De Saussure, Course in general linguistics, els from the foundations: A study on syllabic speech</p>
        <p>Columbia University Press, 2011. units, in: F. Boschetti, G. E. Lebani, B. Magnini,
[9] D. Russo, The Notion of Syllable across History, N. Novielli (Eds.), Proceedings of the 9th Italian
Theories and Analysis, Cambridge Scholars Pub- Conference on Computational Linguistics, Venice,
lishing, 2016. Italy, November 30 - December 2, 2023, volume 3596
[10] M. Loporcaro, Syllable, segment and prosody, in: of CEUR Workshop Proceedings, CEUR-WS.org, 2023.</p>
        <p>The Cambridge history of the Romance languages, URL: https://ceur-ws.org/Vol-3596/paper51.pdf .
2011, pp. 50–108. [23] O. Sulea, L. P. Dinu, B. Dumitru, Full
inflec[11] W. Meyer-Lübke, Grammaire des langues romanes, tion learning using deep neural networks, in:
volume 4, H. Welter, 1906. A. F. Gelbukh (Ed.), Computational Linguistics
[12] M.-D. Glessgen, Linguistique romane: domaines and Intelligent Text Processing - 19th
Internaet méthodes en linguistique française et romane, tional Conference, CICLing 2018, Hanoi, Vietnam,
Armand Colin, 2007. March 18-24, 2018, Revised Selected Papers, Part
[13] F. S. Miret, Fonética histórica, in: Manual de lingüís- I, volume 13396 of Lecture Notes in Computer
Scitica románica, Ariel España, 2007, pp. 227–250. ence, Springer, 2018, pp. 408–415. URL: https://doi.
[14] F. d’Ovidio, W. Meyer-Lübke, Grammatica storica org/10.1007/978-3-031-23793-5_33. doi:10.1007/
della lingua e dei dialetti italiani, volume 368, U. 978-3-031-23793-5\_33.</p>
        <p>Hoepli, 1906. [24] M. Petrillo, F. Cutugno, A syllable segmentation
al[15] G. Rohlfs, T. Franceschi, Grammatica storica della gorithm for english and italian., in: INTERSPEECH
lingua italiana e dei suoi dialetti: Morfologia, (No 2003, 2003, pp. 2913–2916.</p>
        <p>Title) (1968). [25] Q. Dou, S. Bergsma, S. Jiampojamarn, G. Kondrak, A
[16] B. Bigi, C. Petrone, A generic tool for the automatic Ranking Approach to Stress Prediction for
Letter-tosyllabification of italian, A generic tool for the Phoneme Conversion, in: Proceedings of the Joint
automatic syllabification of Italian (2014) 73–77. Conference of the 47th Annual Meeting of the ACL
[17] C. R. Adsett, Y. Marchand, Are Rule-based Syl- and the 4th International Joint Conference on
Natlabification Methods Adequate for Languages with ural Language Processing of the AFNLP: Volume 1
Low Syllabic Complexity? The Case of Italian, in: Volume 1, ACL ’09, Association for Computational
P. Wagner, J. Abresch, S. Breuer, W. Hess (Eds.), Linguistics, 2009, p. 118–126.</p>
        <p>Sixth ISCA Workshop on Speech Synthesis, Bonn, [26] L. P. Dinu, An approach to syllables via
Germany, August 22-24, 2007, ISCA, 2007, pp. 58– some extensions of marcus contextual
gram63. mars, Grammars 6 (2003) 1–12. URL: https://
[18] C. R. Adsett, Y. Marchand, V. Keselj, Syllabifi- doi.org/10.1023/A:1024089129146. doi:10.1023/A:
cation rules versus data-driven methods in a lan- 1024089129146.
guage with low syllabic complexity: The case of [27] L. P. Dinu, A. Dinu, On the data base of romanian
italian, Comput. Speech Lang. 23 (2009) 444– syllables and some of its quantitative and
cryp</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Suyanto</surname>
          </string-name>
          ,
          <article-title>Incorporating syllabification points into a model of grapheme-to-phoneme conversion</article-title>
          ,
          <source>International Journal of Speech Technology</source>
          <volume>22</volume>
          (
          <year>2019</year>
          )
          <fpage>459</fpage>
          -
          <lpage>470</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Vitale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cutugno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Origlia</surname>
          </string-name>
          , G. Coro,
          <article-title>Exploring emergent syllables in end-toend automatic speech recognizers through model explainability technique</article-title>
          ,
          <source>Neural Comput. Appl</source>
          .
          <volume>36</volume>
          (
          <year>2024</year>
          )
          <fpage>6875</fpage>
          -
          <lpage>6901</lpage>
          . URL: https://doi.org/10.1007/s00521-024-09435-1. doi:
          <volume>10</volume>
          .1007/S00521-024-09435-1.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <article-title>On the syllabic similarities of romance languages</article-title>
          , in: A. F. Gelbukh (Ed.),
          <source>Computational Linguistics and Intelligent Text Processing</source>
          , 6th International Conference, CICLing
          <year>2005</year>
          ,
          <string-name>
            <given-names>Mexico</given-names>
            <surname>City</surname>
          </string-name>
          , Mexico,
          <source>February 13-19</source>
          ,
          <year>2005</year>
          , Proceedings, volume
          <volume>3406</volume>
          of Lecture Notes tographic aspects, in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gangemi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , D. Tapias (Eds.),
          <source>Proceedings of the Fifth International Conference on Language Resources and Evaluation</source>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2006</year>
          , Genoa, Italy, May
          <volume>22</volume>
          -28,
          <year>2006</year>
          ,
          <string-name>
            <given-names>European</given-names>
            <surname>Language Resources Association</surname>
          </string-name>
          (ELRA),
          <year>2006</year>
          , pp.
          <fpage>1795</fpage>
          -
          <lpage>1798</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <article-title>On the behavior of romanian syllables related to minimum efort laws</article-title>
          , in: Proceedings Workshop Multilingual Resources,
          <article-title>Technologies and Evaluation for Central and Eastern European Languages, co-located with RANLP 2009, Borovets</article-title>
          ,
          <year>Bulgaria 2006</year>
          ,
          <year>2009</year>
          , pp.
          <fpage>9</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chebanow</surname>
          </string-name>
          ,
          <article-title>On conformity of language structures within the Indoeuropean family to poisson's law</article-title>
          , Comptes rendus de l'Academie de science de l'
          <source>URSS</source>
          <volume>55</volume>
          (
          <year>1947</year>
          )
          <fpage>99</fpage>
          -
          <lpage>102</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>G.</given-names>
            <surname>Altmann</surname>
          </string-name>
          ,
          <article-title>Prolegomena to Menzerath's Law, Glottometrika 2 (</article-title>
          <year>1980</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [31]
          <string-name>
            <surname>A. M. Ciobanu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Dinu</surname>
            ,
            <given-names>L. P.</given-names>
          </string-name>
          <string-name>
            <surname>Dinu</surname>
          </string-name>
          ,
          <article-title>Predicting romanian stress assignment</article-title>
          , in: G. Bouma, Y. Parmentier (Eds.),
          <source>Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, April 26- 30</source>
          ,
          <year>2014</year>
          , Gothenburg, Sweden, The Association for Computer Linguistics,
          <year>2014</year>
          , pp.
          <fpage>64</fpage>
          -
          <lpage>68</lpage>
          . URL: https://doi.org/10.3115/v1/e14-
          <fpage>4013</fpage>
          . doi:
          <volume>10</volume>
          .3115/ V1/E14-4013.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Dinu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Ciobanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Chitoran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Niculae</surname>
          </string-name>
          ,
          <article-title>Using a machine learning model to assess the complexity of stress systems</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Choukri</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Declerck</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Loftsson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Maegaard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mariani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Moreno</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Odijk</surname>
          </string-name>
          , S. Piperidis (Eds.),
          <source>Proceedings of the Ninth International Conference on Language Resources and Evaluation</source>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2014</year>
          , Reykjavik, Iceland, May
          <volume>26</volume>
          -31,
          <year>2014</year>
          ,
          <string-name>
            <given-names>European</given-names>
            <surname>Language Resources Association</surname>
          </string-name>
          (ELRA),
          <year>2014</year>
          , pp.
          <fpage>331</fpage>
          -
          <lpage>336</lpage>
          . URL: http://www.lrec-conf.org/proceedings/ lrec2014/summaries/1200.html.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Van</given-names>
            <surname>Merriënboer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gulcehre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bougares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Learning phrase representations using rnn encoderdecoder for statistical machine translation</article-title>
          ,
          <source>arXiv preprint arXiv:1406.1078</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <source>arXiv preprint arXiv:1412.6980</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>