Evaluating Speech Synthesis on Mathematical Sentences Alessandro Mazzei Università degli Studi di Torino alessandro.mazzei@unito.it Michele Monticone Cristian Bernareggi Università degli Studi di Torino Università degli Studi di Torino michele.monticone@edu.unito.it cristian.bernareggi@google.com Abstract for specifying the details of typographical visual- ization rather than for efficiently communicate the English. In this paper we present the semantics of a mathematical expression. For in- main features of a rule-based architecture stance, the simple LATEX expression f (x) is a typo- to transform a LATEX encoded mathemat- graphical description and so it represents both the ical expression into its equivalent mathe- function application of f to x, and the multiplica- matical sentence form, i.e. a natural lan- tion of the variable f for the variable x surrounded guage sentence expressing the semantics by parenthesis. of the mathematical expression. More- There are many lines of research to enable peo- over, we describe the main results of a first ple with sight impairments to access mathemati- human based evaluation of the system for cal contents. It is possible to embed mathemat- Italian language focusing on speech syn- ical expressions in web pages not only as im- thesis engines. ages but through MathML or MathJax (Cervone, Italiano. In questo lavoro presen- 2012) and in PDF documents produced from La- tiamo le caratteristiche principali di TeX (Ahmetovic et al., 2018). Other research di- un’architettura software a regole per rections concern conversion into Braille (Soiffer, trasformare un’espressione matematica, 2016) and speech reading (Raman, 1996; Wal- codificata in LATEX, nella sua equivalente traud Schweikhardt, 2006; Sorge et al., 2014). frase matematica, cioè una frase del lin- In this paper we follow another direction: we guaggio naturale che esprima la stessa consider the possibility to produce a mathemati- semantica dell’espressione originale. In- cal sentence, i.e. a natural language sentence ex- oltre, descriviamo i primi risultati di una pressing the semantics of a mathematical expres- valutazione del sistema fatta da esseri sion. Indeed, the idea to use mathematical sen- umani per la lingua italiana riguardante tences for improving the accessibility of math- principalmente i motori di sintesi del par- ematical expressions has been previously pre- lato. sented and experimented for Spanish in (Ferres and Fuentes Sepúlveda, 2011; Fuentes Sepúlveda and Ferres, 2012). However, in contrast to previ- 1 Introduction ous work on mathematical sentences, in this work Computational linguistics can help people in many we use a natural language generation (NLG) archi- ways, especially in the field of assistive technolo- tecture rather than a template-based one for gener- gies. In the case of mathematical domain, blind ating sentences. By using NLG architecture we people can access to a mathematical expression obtain (i) more portability, and (ii) a major and by listening its LATEX source. However, this pro- simple customization of the output. cess has several drawbacks. First of all, it assumes We have two research goals in this paper. The the knowledge of the LATEX. Second, listening first goal is to describe a system for transform- LATEX is slow and error-prone, since LATEX is a ty- ing a mathematical expression natively encoded pographical language, that is a language designed in LATEX in its equivalent mathematical sentence (cf. Figure 1). The processing flow follows a well- Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 known approach, called interlingua in the field of International (CC BY 4.0). machine translation (Hutchins and Somer, 1992). Enhanced CMML w ath ce rit . m ten te L PostProcessor S2S se n M n CM LATEX audio mathematical math. expression sentence LatexML SynthCaller Figure 1: The software architecture for the generation of mathematical sentences. The process starts from (1) the LATEX representation of the expression, (2) its translation in CMML, (3) enhancement of CMML, (4) generation of the written form of the mathematical sentence, (5) production of the audio form of the mathematical sentence. Indeed, the process of generating a mathematical to CMML standard and (2) to remove ambiguity expression from its LATEX source is a two-step al- as for the case y = f (x). In Figure 2 we re- gorithm. In the first step the LATEX is analyzed and port the CMML representation for the mathemati- its semantics is represented in Content MathML cal expression x > b =⇒ |f (x)| < M . (CMML henceforth), a W3C standard for express- Mathematical notation has been conceived with ing the syntax and the semantics of mathematical the aim of representing mathematical concepts us- expressions1 . In the second step, the CMML rep- ing a specific written symbolic language. As resentation is used as input of the S2S (Semantics working hypothesis, we decided to assume a “spe- to Speech) module, that is a NLG module gen- cialized” syntactic analysis for a number of math- erating the mathematical sentence. Note that the ematical objects. For instance, x plus three S2S module inserts in the sentence parenthesis and indicates the action of adding one quantity to an- pauses too. The sentence will finally be trans- other, so it can be represented as a declarative formed in audio format encoding by an external structure. As a consequence, plus can be anal- synthesis engine. ysed as verb and this assumption can be extended The second goal of this paper is to give a first to all the mathematical sentences. In this paper we evaluation of the performance of two distinct syn- considered only the mathematical structures be- thesis engines in the domain of mathematical sen- longing to the subfield of the mathematical anal- tence. With a pilot experimentation conducted ysis. In particular, we considered all the expres- with four blind people, we will compare the per- sions in an Italian analysis book (Pandolfi, 2013). ception of the mathematical sentences of a neural- By using this corpus of expressions and by assum- network based speech engine and of a formant- ing that all numbers and variables can be treated based speech engine. as nouns and that all arithmetic operators can be In Section 2 we will describe the main features treated as verbs, we found eight additional cat- of the developed system, in Section 3 we will de- egories for representing all complex mathemati- scribe the experimentation and finally in Section 4 cal expressions and we defined a specific syntactic we end the paper with some conclusions and in- construction for each category. troducing future work. In Table 1, we reported some examples of 2 Building Mathematical Sentences syntactic constructions for mathematical expres- sions. We decided to analyse and represent the The first step of our algorithm is the generation mathematical sentences of relational operators as of CMML associated to a LATEX formula. We copula sentences (a è maggiore di b, a is based this step on an external tool named La- greater than b), algebraic operators as declarative texML (Miller, 2007). However, the CMML ob- sentences (a prodotto cartesiano b, a tained from this tool needed to be enhanced by cartesian product b), logical operators as con- a post-processing procedure for (1) uniform them junctions (a o b, a or b), elementary oper- 1 https://www.w3.org/TR/MathML3/ ators (e.g. radice, radical), sequence (e.g. chapter4.html limite, limit), calculus (e.g. integrale, inte- specified by the input mathematical expression, so the content selection phase is not necessary at all. x In Section 2.1 we will give some details on the b rule-based sentence planner designed for manag- ing mathematical sentences and in Section 2.2 we will describe the use of the SimpleNLG-it realizer for the case of mathematical domain. f x 2.1 Building a Sentence Planner for M Mathematical Sentences The input of the sentence planner is a mathemati- Figure 2: The CMML representation of the mathematical cal expression in the form of enhanced CMML. In expression x > b =⇒ |f (x)| < M . order to associate a sentence plan, that is a a sort of under-specified tree-based syntactic structure, we devised a recursive algorithm that traverses top- Mathematical Expression Construction down the CMML structure. By considering the eight categories used to clas- >, ≥, , . . . Copula sify all mathematical expressions, for each cate- gory we designed a prototypical sentence plan that +, −, ∗, . . . Declarative will be used in the recursive process. Each proto- ∧, ∨, ¬, . . . Coordination type builds a specific linguistic construction (e.g. copula, reduced relative etc.), that is designed for sin, cos, tan, . . . Noun Phrase [b] giving syntactic roles to the arguments of the spe- cific mathematical construction. For instance, on X [f (x)], . . . Noun Phrase [x=a] Z [b] the left of the Figure 3, we reported the prototyp- f (x) dx Noun Phrase ical sentence plan for the conditional set mathe- [a] matical structure and on the right of we reported an example of its instantiation. In the final pro-  [vars] | conditions Reduced Relative duced structures we have that, (1) the leaves of ([x], [y]) Reduced Relative the sentence plan are lemmas rather than words, Table 1: Mathematical expressions and their linguistic con- (2) the syntactic relations among the nodes are structions. expressed using both dependency relations (e.g. subj, complement) as well as constituency nodes gral) as noun phrases (La radice quadrata (e.g. Prepositional Phrase, PP). Note that this is di x, the square root of x), pairs and conditional the input format for sentence plan required by the sets as reduced relatives (L’insieme delle SimpleNLG realizer (see Section 2.2). x tali che x è minore di 3, the set of In order to build a complete sentence plan for x such that x is less than 3). Our syntactic repre- a mathematical sentence by using the eight cate- sentations for mathematical operators in the anal- gories for mathematical expressions, there are two ysis domain could have alternative representations important issues. or could be specialized in a more refined classifi- The first issue concerns the perception of prece- cation (c.f. (Chang, 1983)), but we decided to use dence of the arithmetic operator. Listening mathe- only eight category for sake of simplicity. matics has some peculiarities with respect to read- Traditional NLG architectures split the genera- ing it. For instance, division is granted a higher tion process into three distinct phases, that are doc- precedence than addition, and during the reading ument planning, sentence planning and realization process the expression a + b/c is parsed as a + cb (Reiter and Dale, 2000; Gatt and Krahmer, 2018). without ambiguities. A different result arises if In particular document planning decides what to one listens the equivalent mathematical sentence a say and sentence planning and realization decides plus b divided by c without reading the how to say it. In the system architecture depicted expression: we experimented that the most fre- in Figure 1, the content of the communication is quent perceived parse is a+b c . After a limited num- NP complement complement il insieme PP PP di NP tale che Clause insieme il x det compl compl subj obj complement NP V AdjP PP il op1 op2 x essere minore di NP prep prep di tali che 0 Figure 3: The prototypical sentence plan for the conditional set mathematical structure (left), and its fulfillment producing the sentence L’insieme degli x tali che x è minore di 0 (rigth, the set of all x such that x is lesser than 0). ber of experiments in listening arithmetic expres- Note that a parenthesis has to be considered nec- sions with distinct (blind and not blind) people, essary with respect to the inverted precedence or- we decided to state as working hypothesis that the der hypothesis stated above. In the pause strategy, precedence of the arithmetic operators are per- all the necessary pauses are inserted in the sen- ceived in the reverse order when one listens a tence plan. In the smart strategy, all the neces- mathematical expressions without reading it2 . sary parentheses are inserted in the higher nodes A second issue is how to represent the correct of the sentence plan, and the necessary pauses are structures of the operators. In other words, inserted close to the leaves of the sentence plan. how we can build a mathematical sentence This is a hybrid strategy that combines parentheses unambiguously equivalent to a+b c ? A trivial and pauses in order to have a less verbose mathe- but effective solution is to use parenthesis, matical sentence. that is to produce the mathematical sentence 2.2 NLG for spoken mathematics open parenthesis a plus b close parenthesis divided by c. However, In order to produce a spoken mathematical sen- the drawback of this solution is the length of the tences in Italian with the SimpleNLG-it realizer sentence that, for very complex expressions, can (Mazzei et al., 2016), we needed to account for the augment substantially. construction of a domain specific lexicon for the In order to account for both the issues, we field of the mathematical analysis. SimpleNLG- modified the sentence planner in two ways. it is the Italian porting of the SimpleNLG real- First, we decided to model parenthesis as lexi- izer, that was originally designed only for English cal items, that is we considered open-parenthesis (Gatt and Reiter, 2009). As default Italian lex- and closed-parenthesis as two new lexical items icon, SimpleNLG-it uses a basic vocabulary of of the SimpleNLG lexicon which can be used around 7000 words, that is a simple lexicon stud- as pre-modifier and post-modifier of a mathe- ied to be perfectly understood by most Italian peo- matical sentence respectively. Second, similar ple (Mazzei, 2016; Conte et al., 2017; Ghezzi et to (Fuentes Sepúlveda and Ferres, 2012), we al- al., 2018). However, for this specific project we lowed to use a speech pause as a synonymous of needed to augment the basic lexicon with both (i) open/closed-parenthesis items. Moreover, in order a mathematical specialized lexicon, that contains to experiment both with parentheses and pauses both new lexical entries (as arcotangente, in the understanding of a mathematical sentence, arctangent), and (ii) new values for lexical en- we decided to implement three distinct parenthe- tries which are yet in the basic lexicon (as the sization strategies, called parenthesis, pause, and value noun for the part of speech of the lemma smart. In the parenthesis strategy, all the neces- integrale, integral). This specialized lexicon sary parentheses are inserted in the sentence plan. contains 113 entries which are mostly categorized as nouns (e.g. logaritmo, logarithm), verbs 2 We have not been able to find any scientific reference on (e.g. intersecare, intersect), adjective (e.g. this point. iperbolico, hyperbolic). In the lexicon, there are only two new instances of adverbs (that are ID Formula relativamente and propriamente, rela- E1 A × B = {(x, y) | x ∈ A, y ∈ B} tive, properly), and only one instance of “prepo- E4 x > b =⇒ |f (x)| < M  sitional locution” (that is tale che, such that). f (x) − f (x0 ) E6 lim − f 0 (x0 ) = 0 Finally, we added specific lexical items to realize x→x0 Z x − x0 both parenthesis (that are parentesi aperta 1 x E8 √ dx = arcsin + c and parentesi chiusa, open/closed paren- m2 − x2 m 1 n  thesis) and speech pause. This latter item will be E10 lim 1 + =e finally realized by using the SSML (Speech Syn- n thesis Markup Language) tag , that Table 2: The five mathematical expressions used for experi- can be processed by many speech synthesis en- mentation. gines3 . The actual version of the mathematical sen- preliminarily judged accessible by a blind person. tence generator has been interfaced with two In this paper we discuss the results of 10 core speech synthesis engines, that are the web ser- questions of the questionnaire that have been cre- vice provided by the IBM-Watson framework4 ated by using the 5 mathematical expressions be- (W-engine henceforth), and the Espeak API5 (E- longing to the Table 2. We use the W-engine to engine henceforth). W-engine is a commercial, build 5 mathematical sentences and the E-engine closed software based on deep learning, while E- to build other 5 mathematical sentences. Note that engine is a free, open-source software based on we change the names of the variables in the two formant synthesis algorithms. Note that for not vi- set of sentences. sual impaired people W-engine sounds more flu- In order to score the comprehension of the ent but, in contrast, for visual impaired people E- user we decided to use the SPICE (Anderson et engine sounds more familiar since it is used by a al., 2016) metric. SPICE is obtained by com- widespread free screen reader. puting the F-score of the overlap between two 3 Evaluation trees: the overlap is measured by decomposing trees in typed elementary substructures, that In order to have a first evaluation of the generation are operands, operators and their relations. For system, we built a web-based test explicitly de- instance, the expression x − 1 is decomposed as signed for visually impaired people. We designed  1, x, minus, (op: minus, first: x), (op: minus, second: 1) a questionnaire composed by a 6 multiple choices (cf. (Anderson et al., 2016) for more details). questions concerning personal data, a core of 25 For the experimentation, we recruited 4 visually open questions each one concerning the listening impaired people with personal invitation without of a mathematical sentence and its comprehensi- any rewards. All users are Italian mother tongue, bility, 1 Likert-scale question globally comparing have a good knowledge of mathematical analysis LATEX and system comprehensibility, 1 open ques- and have a bachelor degree (only one related to tion for free comments. mathematics). The 25 core questions have a all the same In Table 3 we reported the averaged values of schema: there is a audio file encoding a math- SPICE for W-engine and E-engine. A first view of ematical sentence and there is a open form for data seems suggest a preference for the E-engine, transcribing it. In the compilation instructions, but there is not a significant effect on the perfor- we asked the users to fill this section by using mance of the system: by applying the t-test we ob- “LATEX or with other non ambiguous formal rep- tained for 0.08 (two-tailed p-value), indicating no resentation”. The mathematical expressions ob- statistical significance. So, new experiments with tained have been manually translated to CMML more trials and users are necessary to statistically for evaluation. We implemented the questionnaire confirms the preference of for the E-engine. by using the Google Form framework, that was In Table 4, we report the The distribution of 3 https://www.w3.org/TR/ the answers in Likert scale for the question of speech-synthesis11/ 4 the web form concerning comprehensibility, that https://www.ibm.com/watson/services/ text-to-speech/ is “Quanto sei d’accordo con la frase: - La frase 5 http://espeak.sourceforge.net pronunciata è facile da capire -” (How much do Engine U1 U2 U3 U4 W-engine 0.96 (0.06) 0.95 (0.12) 0.97 (0.06) 0.97 (0.06) E-engine 0.99 (0.03) 0.99 (0.03) 0.97 (0.04) 0.97 (0.04) Table 3: The averaged SPICE measures and standard deviations for the speech synthesis W-engine and E-engine. Engine U1 U2 U3 U4 W-engine 4.60 (0.55) 5.20 (1.10) 4.00 (0.71) 4.00 (1.22) E-engine 4.60 (0.89) 4.00 (1.73) 5.60 (1.14) 4.40 (1.52) Table 4: The distribution of the answers in Likert scale (1 − 7) for the question concerning comprehensibility. you agree with the sentence: - The pronounced Computers and Accessibility, ASSETS ’18, pages sentence is easy to understand -”). The value 1 352–354, New York, NY, USA. ACM. corresponds to “per nulla” (nothing), the value 7 [Anderson et al.2016] Peter Anderson, Basura Fer- corresponds to “completamente” (completely). It nando, Mark Johnson, and Stephen Gould. 2016. seems from data that there is not notable differ- SPICE: semantic propositional image caption evalu- ence between the perceived comprehensibility of ation. CoRR, abs/1607.08822. the W-engine with respect to the E-engine and the [Cervone2012] Davide Cervone. 2012. Mathjax: A t-test we obtained for the Likert score is 0.67 (two- platform for mathematics on the web. Notices of tailed p-value). the American Mathematical Society, 59, 02. [Chang1983] Lawrence A. Chang. 1983. Hand- 4 Conclusion book for spoken mathematics (larry’s speakeasy). Lawrence Livermore Laboratory, The Regents of the In this paper we have presented a study on the University of California., 1. generation of mathematical sentences, i.e. natu- ral language sentences encoding mathematical ex- [Conte et al.2017] Giorgia Conte, Cristina Bosco, and pressions6 . In particular, we have described the Alessandro Mazzei. 2017. Dealing with ital- ian adjectives in noun phrase: a study oriented main features of the system and the a first experi- to natural language generation. In Proceedings mentation centred on the evaluation of two distinct of the Fourth Italian Conference on Computational speech engine. The results of the experimenta- Linguistics (CLiC-it 2017), Rome, Italy, December tion suggests a good performance of the formant- 11-13, 2017., December. based synthesis engine with respect to the neural- [Ferres and Fuentes Sepúlveda2011] Leo Ferres and network base synthesis engine. However, more José Fuentes Sepúlveda. 2011. Improv- data is necessary to achieve statistical significance. ing accessibility to mathematical formulas: the In future work we intend to repeat the evalua- wikipedia math accessor. In Proceedings of the International Cross-Disciplinary Conference on tion of the system for Italian with a larger number Web Accessibility, W4A 2011, Hyderabad, Andhra of users and to repeat the experiment by using En- Pradesh, India, March 28-29, 2011, page 25. glish lanaguage too. [Fuentes Sepúlveda and Ferres2012] José Fuentes Sepúlveda and Leo Ferres. 2012. Im- proving accessibility to mathematical formulas: The References wikipedia math accessor. New Rev. Hypermedia [Ahmetovic et al.2018] Dragan Ahmetovic, Tiziana Ar- Multimedia, 18(3):183–204, September. mano, Cristian Bernareggi, Michele Berra, Anna Capietto, Sandro Coriasco, Nadir Murru, Alice [Gatt and Krahmer2018] Albert Gatt and Emiel Krah- Ruighi, and Eugenia Taranto. 2018. Axessibility: mer. 2018. Survey of the state of the art in natural A latex package for mathematical formulae accessi- language generation: Core tasks, applications and bility in pdf documents. In Proceedings of the 20th evaluation. J. Artif. Intell. Res., 61:65–170. International ACM SIGACCESS Conference on [Gatt and Reiter2009] Albert Gatt and Ehud Reiter. 6 The described system can be freely 2009. SimpleNLG: A Realisation Engine for downloaded at https://bitbucket. Practical Applications. In Proceedings of the org/tesimagistralemonticone/ 12th European Workshop on Natural Language formula-to-speech/ Generation, ENLG ’09, pages 90–93, Stroudsburg, PA, USA. Association for Computational Linguis- [Waltraud Schweikhardt2006] Nadine Jessel Benoit tics. Encelle Margaret Gut Waltraud Schweikhardt, Cristian Bernareggi. 2006. Lambda: A european [Ghezzi et al.2018] Ilaria Ghezzi, Cristina Bosco, and system to access mathematics with braille and audio Alessandro Mazzei. 2018. Auxiliary selection in synthesis. In Lecture Notes in Computer Science, italian intransitive verbs: A computational investi- volume 4061, Berlin. Springer. gation based on annotated corpora. In Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), pages 1–6, Berlin. CEUR. [Hutchins and Somer1992] W. John Hutchins and Harold L. Somer. 1992. An Introduction to Machine Translation. London: Academic Press. [Mazzei et al.2016] Alessandro Mazzei, Cristina Battaglino, and Cristina Bosco. 2016. SimpleNLG- IT: adapting SimpleNLG to Italian. In Proceedings of the 9th International Natural Language Generation conference, pages 184–192, Edinburgh, UK, September 5-8. Association for Computational Linguistics. [Mazzei2016] Alessandro Mazzei. 2016. Build- ing a computational lexicon by using SQL. In Pierpaolo Basile, Anna Corazza, Francesco Cu- tugno, Simonetta Montemagni, Malvina Nissim, Viviana Patti, Giovanni Semeraro, and Rachele Sprugnoli, editors, Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016), Napoli, Italy, December 5-7, 2016., volume 1749, pages 1–5. CEUR-WS.org, December. [Miller2007] Bruce Miller. 2007. LaTeXML: A LaTeX to XML converter. [Pandolfi2013] Luciano Pandolfi. 2013. ANALISI MATEMATICA 1. Dipartimento di Scienze Matematiche “Giuseppe Luigi Lagrange”, Politec- nico di Torino. [Raman1996] T. V. Raman. 1996. Emacs- peak—direct speech access. In Proceedings of the Second Annual ACM Conference on Assistive Technologies, Assets ’96, pages 32–36, New York, NY, USA. ACM. [Reiter and Dale2000] Ehud Reiter and Robert Dale. 2000. Building Natural Language Generation Systems. Studies in Natural Language Processing. Cambridge University Press. [Soiffer2016] Neil Soiffer. 2016. A study of speech versus braille and large print of mathematical expre- sisons. In Lecture Notes in Computer Science, vol- ume 9758, Berlin. Springer. [Sorge et al.2014] Volker Sorge, Charles Chen, T. V. Raman, and David Tseng. 2014. Towards mak- ing mathematics a first class citizen in general screen readers. In Proceedings of the 11th Web for All Conference, W4A ’14, pages 40:1–40:10, New York, NY, USA. ACM.