A Possible Optimisation Procedure for US and MRI Tongue Contours

A Possible Optimisation Procedure for US and MRI Tongue Contours RékaTrencsényi trencsenyi.reka@science.unideb.hu Department of Electrical and Electronic Engineering University of Debrecen

Debrecen Hungary

LászlóCzap czap@uni-miskolc.hu Institute of Automation and Infocommunication University of Miskolc

Miskolc Hungary

A Possible Optimisation Procedure for US and MRI Tongue Contours D7140FDFDBAB0BF8A143DF765484ECD4 GROBID - A machine learning software for extracting information from scholarly documents Data visualisation computational linguistics speech research dynamic US and MRI records automatic tongue contour tracking AMS Subject Classification: 68N19 68P05 68T50 68U10 68W99

The topic of the article is speech research. The main instruments of the study are US and MRI records of human beings, which were made during speech. In the dynamic records, primarily, the motion of the tongue is analysed and followed by automatic tongue contour tracking algorithms. The tongue contours are used to elaborate geometric transformations between US and MRI frames, which are the starting points of the optimisation of the matching of US and MRI tongue contours belonging to the same speech sound. As a result, the radial US geometry and the rectangular MRI geometry are embedded into each other in a biunique way.

Introduction

One of the fundamental tools of the study of speech production is the analysis of dynamic records of human speakers, made by ultrasound (US) [4,9] and magnetic resonance imaging (MRI) [7] techniques. Investigating and processing these twodimensional records created in the so-called sagittal plane resulting in a side view of the human body, relevant qualitative and quantitative information can be gained about the main features of articulation. Qualitative statements mainly refer to the relative position of the tongue and palate in the case of different speech sounds and sound transitions, while quantitative descriptions focus on the recognition and connection of the geometric parameters which have high importance in the understanding of the relationships between the acoustic and articulatory characteristics of speech. Quantitative analyses can be performed in several ways with a wide variety [3,5,6]. The starting points of the investigations of our present study are tongue contours fitted to the frames of US [10] and MRI [8] records by automatic algorithms [11]. The used US and MRI sources differ from each other in many details, such as the gender and nationality of the speakers, the geometry, resolution, and scale of the images, and the visually evaluable anatomic segments of the vocal tract. The aim of our research work is to match the US and MRI sources by elaborating, applying, and optimising the proper geometric transformations between the US and MRI tongue contours in a biunique way.

In the literature, several publications can be found that deal with the fusion of information arising from sources produced by different imaging techniques. The demand for automatic tongue contour tracking algorithms emerged even in the previous decade [2], confirming the necessity of fully automated procedures like our algorithm that does not require any manual actions, as it is based on dynamic programming [11]. Another benefit of our present results that we have been working with dynamic US and MRI records instead of static frames belonging to sustained sounds exclusively [1]. The US videos were made by the Micro system of the MTA-ELTE Lendület Lingual Articulation Research Group of the Hungarian Academy of Sciences, and the MRI videos made by fast MRI were downloaded from the website of the University of Southern California. Also, such studies appeared that aim to perform transformations between coordinate systems connected to US and MRI frames relying on the optimisation of distances measured between special points of the human head. [1]. In comparison with [1], it must be emphasised that our transformations relate directly to the tongue contours, and the transformation is carried out in one step without any intermediate coordinate system, so starting from the US contour, one gets to the MRI contour immediately. Furthermore, the optimisation procedure minimises the global distance between the linked US and MRI tongue contours in the case of more than one sounds simultaneously.

Transformation and Optimisation

The Geometrical Considerations and Mathematical Formulas of the Transformations for Tongue Contours

When writing the exact mathematical form of the transformation, we relied on the special geometry of the available US records. Namely, the imaging US head scans such a radial region of the oral cavity which is seen at an angle of 90 ∘ measured from a fixed centre 𝐶. Consequently, it is obvious to treat the US images and the points of the belonging tongue contours in such a polar coordinate system of 𝐶 origin where the position of each pixel is given by radius 𝑟 measured from point 𝐶 and the signed angle 𝜙 measured from the central vertical axis of the image unambiguously. The aim of the transformation is to embed the radial geometry of the US frames to the rectangular geometry of the MRI records described by the two-dimensional Cartesian coordinates so that the US and MRI tongue contours assigned to the same sound should overlap with each other as much as possible.

The transformation of the US tongue contours can include three basic operations: the scaling of the radial range, the scaling of the angular range, and the translation of the angular range. The three operations can be realised mathematically by the formulas

𝑟 ′ = 𝑟 • 𝑅, 𝜙 ′ = 𝐹 𝐼 • 𝜙, 𝜙 ′ 0 = 𝜙 0 + 𝐹 𝐼 0 ,(2.1)

where the scale factors 𝑅 and 𝐹 𝐼 allow the normalisation of the radial and angular ranges, and the term 𝐹 𝐼 0 performs the translation of the initial angle of the angular range. The mathematical operations of (2.1) can be interpreted in the physical plane of the images in the following way: by scaling the radial range, the magnification of the tongue contours can be modified. The scaling of the angular range creates the possibility to change the width of the angular range covered by the tongue contours. The translation of the angular range means the rotation of the tongue contours in the plane of the image. Thus, relationships (2.1) fit the US tongue contour to the corresponding MRI frame. Applying the inverse of transformations (2.1), however, also the reverse conversion can be executed, i.e., by dint of the inverse operations

𝑟 = 𝑟 ′ 𝑅 , 𝜙 = 𝜙 ′ 𝐹 𝐼 , 𝜙 0 = 𝜙 ′ 0 − 𝐹 𝐼 0 ,(2.2)

the MRI tongue contour can be mapped onto the corresponding US frame. The parameter set {𝑅, 𝐹 𝐼, 𝐹 𝐼 0 } of the transformations performed in the directions US-MRI and MRI-US must necessarily be the same since, thereby, the maintaining of the relative scale ratio of the US and MRI environment can be ensured independently of the direction of the conversion. During the investigations, we fixed the value of factor 𝐹 𝐼 by 𝐹 𝐼 = 1, which means that the transformation is conformal. Transformations (2.1) and (2.2) become valid by the numerical determination of parameters 𝑅 and 𝐹 𝐼 0 , to which the optimisation of the values of the parameters offers a possible way. During the optimisation procedure, using an algorithm elaborated by us, we find the parameter set, in the case of that the distance between the transformed US tongue contour and the MRI tongue contour serving as a reference curve is minimal. The calculation of the distance is carried out for all of the possible pairs of points of the two curves, then the average of the smallest distances assigned to each point of the US tongue contour is minimised [12]. To the successful transformation, however, not only the exact values of parameters 𝑅 and 𝐹 𝐼 0 are needed but also centre 𝐶 ′ designated in the MRI frame must be known, which is the image of centre 𝐶 of the US record. Beyond these, during the construction of the optimisation algorithm, also the peak of the epiglottis can serve as a good reference point demonstrated by Figure 1, where the peaks of the epiglottis 𝐺 and 𝐺 ′ are marked by green circles in the US and MRI frames, and also centres 𝐶 and 𝐶 ′ are located by red cross signs. The meaning of the parameter set {𝑅, 𝐹 𝐼 0 , 𝐶 ′ , 𝐺, 𝐺 ′ } can be understood by the geometrical considerations of Figure 2, where the left-side block depicts the points of the US frames (𝐶 and 𝐺), while the right-side block carries the points of the MRI frames (𝐶 ′ and 𝐺 ′ ) in agreement with Figure 1. The radial distances 𝑅 1 and 𝑅 2 are measured between the centre of the images and the peaks of the epiglottis. The polar angles 𝜙 1 and 𝜙 2 are made by the central vertical axis of the image and radii 𝑅 1 and 𝑅 2 . Using these quantities, parameter 𝑅 is interpreted as a magnification factor by

𝑅 = 𝑅 2 𝑅 1 ,(2.3)

and parameter 𝐹 𝐼 0 is produced by

𝐹 𝐼 0 = 𝜙 1 + 𝜙 2 , (2.4)

which is actually a difference, since 𝜙 1 is negative and 𝜙 2 is positive, as the polar angle is related to the vertical direction in both frames.

The Optimisation Procedure for Minimising the Distance Between Tongue Contours

Proceeding along the geometrical features of Figure 2, we created an optimisation algorithm using MATLAB via such mathematical formulas which enable the simultaneous optimisation of the parameters {𝑅, 𝐹 𝐼 0 , 𝐶 ′ , 𝐺, 𝐺 ′ }. Some details of the MATLAB scripts can be found in the appendix, where [𝑛1, 𝑛2], [𝑔1, 𝑔2], and [𝑔3, 𝑔4] are the coordinates of 𝐶 ′ , 𝐺, and 𝐺 ′ , respectively, while 𝐹 𝐼𝐾𝑂𝑅𝑅 stands for 𝐹 𝐼 0 . The construction of the mathematical formulas for 𝑅 1 , 𝑅 2 and 𝐹 𝐼1, 𝐹 𝐼2 follows the geometrical structure of Figure 2. As explained in (2.3) and (2.4), the scale factor 𝑅 is the ratio of the radial distances between the centre and the peak of the epiglottis, measured in the US and MRI frames separately, and the angular displacement 𝐹 𝐼 0 is given as the sum of the signed polar angles of the US and MRI frames, made by a radial section and the central vertical axis of the image. In the MATLAB scripts, 𝑐𝑢1 denotes the actual US or MRI curve having points with complex coordinates to be transformed, 𝑢ℎ𝑢𝑠1 and 𝑚𝑟𝑖𝑠1 gives the US and MRI curves to be compared, and 𝑛𝑛𝑢1 = 𝑛𝑛𝑢𝑚1 provides the average of the smallest distances of all of the possible pairs of points of the two tongue contours.

In the appendix, the codes are written only for one of the contours belonging to the investigated speech sounds and labelled by index 1, but to have the complete script, these program blocks must be repeated with the same structure for the other examined sounds with indices 2, 3, 4, . . . , as well.

Results

We fulfilled the optimisation of the parameters {𝑅, 𝐹 𝐼 0 , 𝐶 ′ , 𝐺, 𝐺 ′ } for two speech sounds, k and t simultaneously, and we obtained the following numerical results: (2.5)

𝑅 = 0.

Based on the values in (2.5), it can be observed in Figure 3 that the optimised positions of the peaks of the epiglottis are very close to the static position when the voice box is at rest without speaking. In addition, the centre of the MRI frame is located under the jaw of the speaker. Using the parameters of (2.5), the transformation of the US and MRI tongue contours can be implemented in a bidirectional way according to Figure 4, where the tongue contours belonging to sound k can be seen. The green curves stand for the US, while the red curves represent the MRI tongue contours. In the MRI frame, also the contour of the palate is drawn by the yellow curve. The figures clearly show that the US and MRI tongue contours fit each other in an acceptable way, since it is demonstrated visually, as well, that the distance is minimised by the optimisation algorithm detailed in the appendix let the two curves mostly overlap with each other. At this level, it is a sufficient criterion for the acceptance of the results without any specified lower or upper limit for the distance between the two curves because we aimed to find the relative position of the two tongue contours, which corresponds to the minimal distance ensured by parameters (2.5). So, graphically, a realistic matching is obtained, and only this was expected. For instance, if the transformed tongue contour was out of the region of the oral cavity represented by the US or MRI image or placed visually at an unrealistic distance from the reference curve then the optimisation would be surely false. In order to verify the results, we also checked the projection of such US-MRI pairs of contours onto each other, besides the setting of parameters (2.5) provided by the optimisation, which were not present in the set of sounds 𝑘 and 𝑡. Accordingly, Figure 5 exemplifies our results in the case of sound 𝑒. It can be stated that the matching of the US and MRI tongue contours is approximately as good as in the case of sound 𝑘.

Validation

Further validation of the results is in progress currently, as well. To gain more experience about the harmonisation of US and MRI geometry and improve the fitting of the tongue contours, we aim to develop our research work in several directions. We wish to extend the optimisation procedure to more than two sounds to understand the connection between the result of the optimisation and the number of speech sounds. We would also like to investigate the optimisation in the dependence of different sound contexts and speakers of different nationalities and genders. Furthermore, advancing to a large number of speech sounds, we plan to involve machine learning algorithms, as well.

Figure 1 .1Figure 1. The peaks of the epiglottis 𝐺 and 𝐺 ′ , and the centres 𝐶 and 𝐶 ′ in the US (a.) and MRI (b.) frames.

Figure 2 .2Figure 2. The graphical representation of the main parameters of the US (left) and MRI (right) frames.

4348, 𝐹 𝐼 0 = 0.1293 rad, 𝐶 ′ = (250, 140), 𝐺 = (244, 230), 𝐺 ′ = (158, 198).

Figure 3 .3Figure 3. The results of the optimisation for the centre of the MRI frame (b.) and the positions of the peaks of the epiglottis in the US (a.) and MRI (b.) frames.

Figure 4 .4Figure 4. The results of the optimisation in the case of sound k by presenting the US (green) and MRI (red) tongue contours simultaneously. The contour of the palate is indicated by the yellow curve in the MRI frame.

Figure 5 .5Figure 5. The results of the optimisation in the case of sound e by presenting the US (green) and MRI (red) tongue contours simultaneously. The contour of the palate is indicated by the yellow curve in the MRI frame.

Acknowledgements. We would like to thank the MTA-ELTE Lendület Lingual Articulation Research Group for providing the recordings with the Micro system.

Appendix

Program code for the optimisation

Program code for the calculation of the distances between US and MRI tongue contours K = 1; clear dc1 du1 for q = K : lh1 for qq = 1 : lm1 b1 = imag(uhus1(q)) − imag(mris1(qq)); b2 = real(uhus1(q)) − real(mris1(qq)); dc1(qq) = sqrt(b1 * b1 + b2 * b2); end du1(q) = min(dc1); end nnu1 = mean(du1(K : length(du1)));

Multimodal fusion of electromagnetic, ultrasound and MRI data for building an articulatory model MAron M.-OBerger EKerrien ffinria-00326290f 8th International Seminar On Speech Production -ISSP'08

Strasbourg, France

Dec 2008. 2008 Comparing articulatory images: An MRI/Ultrasound Tongue Image database JCleland AWrench JScobbie SSemple doi: Proceedings of the 9th International Seminar on Speech Production the 9th International Seminar on Speech Production 2011 Quantitative analysis of multimodal speech data SGDanner AVBarbosa LGoldstein 10.1016/j.wocn.2018.09.007 doi: Journal of Phonetics 71 2018 Speech synthesis from real time ultrasound images of the tongue BDenby MStone 10.1109/ICASSP.2004.1326078 doi: IEEE International Conference on Acoustics, Speech, and Signal Processing 2004. 2004 1 685 Geometry of the vocal tract and properties of phonation near threshold: calculations and measurements LFulcher ALodermeyer GKahler SBecker SKniesburges 10.3390/app9132755 doi: Applied Sciences 9 2755 2019 Automated segmentation of upper airways from MRI-vocal tract geometry extraction AOjalammi JMalinen 10.5220/0006138300770084 doi: International Conference on Bioimaging 3 2017 Speech MRI: morphology and function ADScott MWylezinska MJBirch MEMiquel 10.1016/j.ejmp.2014.05.001 doi: Physica Medica 30 6 2014 span | speech production and articulation knowledge group: the rtMRI IPA chart (John Esling) 2020 May 9th A guide to analysing tongue motion from ultrasound images MStone 10.1080/02699200500113558 doi: Clinical Linguistics and Phonetics 19 2005 A comparative study on the contour tracking algorithms in ultrasound tongue images with automatic re-initialization KXu TGCsapó PRoussel BDenby 10.1121/1.4951024 doi: Journal of the Acoustical Society of America 139 5 2016 Automatic tracking of tongue contours in ultrasound records LZhao LCzap 10.15775/Beszkut.2019 .331-343 Beszédtudomány -Speech Science 27 1 2019 Measuring lingual coarticulation from midsagittal tongue contours: Description and example calculations using English /t/ and /a NZharkova NHewlett 10.1016/j.wocn.2008.10.005 doi: Journal of Phonetics 37 2 2009