The Praat script for MOMEL (MELodic MOdelisation), an algorithm proposed in 1991 by Daniel Hirst and Robert Espesser from the Institut Phonétique d'Aix, gives a representation of the melodic curve, which characterises the temporal variations of the laryngeal frequency, by the way of a quadratic spline function.
F0 variations can be considered as the superposition of two phenomena : the macroprosodic effects which can be considered as the elocution intonative choice, and microprosodic effects, which are linked to the phonetic constituents of the sentence. The macroprosody allows to apply a global approach of the melodic curve when the microprosody gives local variations. It’s often the case with consonants like [f], [s], [ch] that are most of the time unvoiced. For that reason, the F0 curve will reach discontinuities with such consonants.
A corpus has been built with consonants in intervocalic context only (CVCV) that gives unbroken F0 curves apart from prosodic boundaries. The MOMEL curve has been calculated and the signal has been resynthesised using PSOLA (Pitch Synchronous OverLap and Add). Informal preliminary perceptual tests have been carried out to compare both original F0 curve resynthesised signal and MOMEL F0 synthesised signal. In most of the cases, signals can be considered as perceptualy equivalent, except in the boundary locations, when a resetting often occurs which can be attenuated because of the quadratic approximation. Therefore we can consider the MOMEL algorithm as a microprosody filter.
MOMEL Algorithm
The MOMEL Algorithm lies on the acceptation that the melodic curve can be, by pieces, approximated with a second degree polynomial. A moving window of length A (Typically 300ms) covers the acoustic signal. In each window, the F0 curve is calculated, and an approximation can be given by such a polynomial, with the only purpose to minimise the quadratic error between the initial curve and the polynomial. Initial points that are more than 5% below the polynomial are set to zero for the approximation process (That is the microprosoy filter). A polynomial is then recalculated with the remaining points, and so on until no new points are set to zero.
Then, the resulting polynomial can be considered as the best second degree representation, so its vertex is calculated and saved as a candidate. The following step consists of the extraction of the target points from those candidates.
To reach that goal, the time domain must be sliced (divided) to separate the candidates from each other (the R value must be chosen in order to be adapted to the speaker's speech rate). On each partition, candidate averages are calculated, and also the typical deviation (in time and in frequency). Values that are outside the acceptation area (centred on the average and delimited by the typical deviation) are deleted, and the average is recalculated with the kept points. This average gives a target point.
The process is done for each partition in order to obtain a sufficient number of target points to well approximate the F0 curve with a quadratic function. That is the last step of the algorithm : to fit the target points with a quadratic spline function.
The following parameters can be changed when the program is called
Hzmin : minimal accepted F0 value
Hzmax : maximal accepted F0 value
A : size of the initial analysis window (default 300ms)
Delta : maximal accepted error percentage for the polynomial approximation
R : size of the second window for the partitions choice (default 200ms)
The following figure represents an F0 curve on which candidates have been put (black points), as well as target points (green circle) and the MOMEL quadratic approximation, in red.
We can notice that a target point is obtained out of the window, at a negative point in time. This occurs with the previous version of the algorithm, in which the test of the candidates is done with a 2xA window size instead of an A-size window. It has been changed on the PRAAT version, and the result is given by figure 2.
Parameters values must be adjusted according to the speaker's speech rate. To be more precise, if the speech rate increases, accents can be weaker, but above all closer. For that reason, the A-size window cannot be approximated with a second order polynomial anymore, it’s then useful to decrease the size of this window. It’s the same for the R-size window.
The delta value (default 5%) can be kept whatever the speaker's speech rate, but for extremly faster speech, this value can be increased, allowing more error.
At least, Hzmin and Hzmax values can be kept whatever the speaker's speech rate, but they can be adapted to the speaker's sex. Typically values vary between 80 to 250 Hz for a male voice and 200 to 400 for a female voice. Those values will be ignored for the drawing under PRAAT (if the Draw_mode option is selected), because the script maximises the dynamic of the "draw window" in order to show both initial and MOMEL F0 curves.
The Debug_mode option allows the output of temporary files which can be used by other programs. Those files are :
Praat momel.script sound.wav 5 30 20 80 500 No Yes Spline Momel
This command will run the script with the "sound.wav" file, with the following arguments :
Delta = 5
A = 30 (30 x 10 ms = 300 ms)
R = 20 (20 x 10ms = 200 ms)
Hzmin = 80 Hz
Hzmax = 500 Hz
Draw_mode : No
Debug_mode : Yes
Spline
MOMEL (Hirst & Espesser's algorithm)
sound.f0, sound.pref, sound.cibles and sound.momel files will be automatically generated and can be used by other programs since they are ASCII files.
Resynthesis under PRAAT
A second script, change_f0.script, also available, resynthesises the sound (with a PSOLA method), from the MOMEL F0 curve. Then, the sounds with the original F0 curve and the modelled curve can be resynthesised, and compared. Psycho-acoustical tests allow to judge of the approximation quality.