Laurent GIRIN
Associate Professor INPG/ENSERG / Speech Scientist ICP

Born 09-26 1969 in Moutiers (Savoie-France)

Address: 47 rue des Alliés – 38000 Grenoble, France

Personal cellular phone: (+33)6 62 82 88 33

Research Laboratory

Institut de la Communication Parlée (Speech Communication Lab.)

Address: 46 avenue Félix Viallet - 38031 Grenoble Cedex 1

Phone: 04 76 57 47 15 – E-mail : girin@icp.inpg.fr – Web : http://www.icp.inpg.fr/~girin/

University

Institut National Polytechnique de Grenoble (INPG) (Engineering University)

Ecole Nationale Supérieure d'Electronique et de Radioélectricité de Grenoble (ENSERG)

(Department of Electronics and Telecommunications of the INPG)

Address: 23 rue des Martyrs - BP257 - 38016 Grenoble

Chronological summary

From 1999    Associate Professor at the INPG/ENSERG / Speech Scientist at the ICP

1997-99             Assistant Teacher at the INPG/ENSERG / Speech Scientist at the ICP

1994-97        Ph.D of the INPG – Cum laude – INPG Best Ph.D Award 1997

1994             Master of Science – INPG – Signals, Speech and Image

Research themes (references of mentioned publications are given in the following of the document)

Audiovisual speech processing

The aim of this work is to develop speech processing techniques and systems able to exploit the natural coherence and complementarity between the acoustic and visual speech signals to improve existing processes generally based on the acoustic signal alone (e.g. enhancement in noise). The visual signals consist in recorded movements of the visible articulators of speech, especially the lips, and eventually the jaw and cheeks. The main applications concern :

-        Speech enhancement in noisy environment: The information contained in lip movements can be used to improve the intelligibility of speech in noise. In (Girin et al., 2001), Wiener LPC-based filters were estimated partly from speaker’s lip shape parameters and partly from the noisy signal. These filters were used to enhance the intelligibility of acoustic speech.

-        Audiovisual speech sources separation: This problem can be seen as the extension of the enhancement problem when several sources of speech and noise are mixed and the mixtures are obtained from several sensors. In the audiovisual case, an additional video sensor must track the lip movements of a specific speaker. Thus, an algorithm based on the maximization of audio and visual features correlation must allow to extract the corresponding speech signal. We first laid the foundations of the problem in the case of additive mixtures (Girin et al., 2001; Sodoyer et al., 2002a, 2002b, 2003a, 2003b, 2003c, to appear 2005) and we are now dealing with the more complex case of convolutive mixtures (Rivet et al., 2004, to appear 2005).

-        Audiovisual speech coding: The aim is here to exploit the audio and visual features coherence (seen as a redundancy) to decrease the bit-rate and/or the complexity of a joint audiovisual speech coders (compared to standard separate audio and video coders). Different structures of lips parameters and LPC coefficients coders based on concatenated or cascaded audiovisual vector/matrix quantization have been proposed and tested in (Girin, 2004).


Cued Speech analysis modeling and synthesis

Since the beginning of 2003, I am involved in studies on Cued Speech and its integration in telecommunication systems. Cued Speech is a language/method used by a part of the hearing impaired people community that is orally educated (but not necessarily able to pronounce speech correctly). It is designed to complement speech lipreading by the association of lip shapes with cues formed by both the shape of the hand and its placement at specific locations around the face.

The ICP is currently involved in a national project called ARTUS which aim is to allow hearing impaired people to replace the television broadcasts subtitles by a synthesis clone delivering the subtitles information in Cued Speech. (see http://www.lis.inpg.fr/la_recherche/artus/artus.htm). In this project, I am in charge of the task of modeling and encoding/decoding the trajectories of the clone’s face and hand articulators, with a very low bit-rate constraint (the transmission is ensured by a very small bandwidth watermarking technique).

Besides, I am the co-conceiver and co-leader of a recent multi-laboratories project called TELMA that is supported by the INPG, which aim is to elaborate an automatic system for translating “mute” Cued Speech into acoustic speech (by video analysis, parameters conversion and/or recognition, see (Burger et al., 2004), and acoustic speech synthesis) an vice-versa (by speech analysis, conversion and video animated clone synthesis). Such system is dedicated to allow interactive and real-time communication between Cued Speech hearing impaired users and normal hearing users. The project includes to study the feasibility of its integration in an autonomous telecommunication terminal, e.g. a cellular phone with video functionalities.

New trends in analysis, modeling and synthesis of speech

Since 2002, I have initiated at the ICP and also in collaboration with Sylvain Marchand from the LaBRI/University of Bordeaux, a series of works dealing with new developments and new applications in the framework of “classical” models of speech and audio signals like, e.g., the sinusoidal model. One of the major aims is to elaborate new efficient models of the trajectories in time of the underlying speech models parameters (e.g., models of amplitude and phase trajectories of the sinusoidal model of speech). The applications include high-quality synthesis (Girin et al., 2003), good quality / very low bit-rate speech coding, speech transformation and speech watermarking (Girin & Marchand, 2004). A major challenge that is at the heart of our current works consists in the elaboration of long-term models that can describe efficiently the dynamics of the speech parameters over long sections of speech, exploring largely beyond the usual frame-by-frame modeling/synthesis approach (Girin et al., 2004; Girin & Firouzmand, to appear 2005).

Supervision of research works

I am currently full advisor of one PhD (student: Mohammad Firouzmand) and co-advisor of another one (student: Bertrand Rivet), both not already achieved. I have been co-advisor of a third one, achieved in 2004 (student: David Sodoyer).

I have been full advisor of four six-months training periods of graduate students (Master of Science), and co-advisor of six other ones.


Main publications (from 2001)

International journals

Sodoyer, D., Girin, L., Jutten, C., & Schwartz, J.-L. (to appear 2005), Developing an audio-visual speech source separation algorithm, Speech Communication, In Press, Corrected Proof, Available online.

Girin, L. (2004), Joint matrix quantization of face parameters and LPC coefficients for low bit rate audiovisual speech coding, IEEE Transactions on Speech and Audio Processing, 12(3), pp. 265-276.

Sodoyer, D., Schwartz, J.-L., Girin, L., Klinkisch, J., & Jutten, C. (2002), Separation of audio-visual speech sources: A new approach exploiting the audiovisual coherence of speech stimuli, Eurasip Journal on Applied Signal Processing, 2002(11), pp.1165-1173.

Girin, L., Schwartz, J.-L. & Feng, G. (2001), Audio-visual enhancement of speech in noise, Journal of the Acoustical Society of America, 109(6), pp. 3007-3020.

International conferences

Girin, L. & Firouzmand, M. (to appear 2005), Perceptually weighted long term modeling of sinusoidal speech amplitude trajectories, Accepted at Int. Conf on Acoustics, Speech & Signal Processing (ICASSP 2005), Philadelphia, USA.

Rivet B., Girin, L. & Jutten, C. (to appear 2005), Solving the indeterminations of blind source separation of convolutive speech mixtures, Accepted at Int. Conf on Acoustics, Speech & Signal Processing (ICASSP 2005), Philadelphia, USA.

Girin, L., Firouzmand, M. & Marchand, S. (2004), Long term modeling of phase trajectories within the speech sinusoidal model framework, Proc. Int. Conf. on Speech & Language Proc. (ICSLP 2004), Jeju, South Korea.

Burger, T., Beautemps, D. & Girin, L. (2004), Characterizing and classifying cued speech vowels from labial parameters, Proc. Int. Conf. on Speech & Language Proc. (ICSLP 2004), Jeju, South Korea.

Rivet, B., Girin, L., Jutten, C. & Schwartz, J.-L. (2004), Using audiovisual speech processing to improve the robustness of the separation of convolutive speech mixtures, IEEE Int. Workshop on Multimedia Signal Processing (MMSP 2004), Siena, Italy.

Girin, L. & Marchand, S. (2004), Watermarking of speech signals using the sinusoidal model and frequency modulation of the partials, Proc. Int. Conf. on Acoustics, Speech & Signal Processing (ICASSP 2004), Montréal, Quebec.

Girin, L., Marchand, S., di Martino, J., Röbel, A. & Peeters, G. (2003), Comparing the order of a polynomial phase model for the synthesis of quasi-harmonic audio signals, Proc. IEEE Int. Workshop on Signal Processing and its Applications to Audio and Acoustics (WASPAA 2003), New Paltz, USA.

Girin, L. (2003), Pure audio McGurk effect, Proc. 5th Int. Conf. on Audio-Visual Speech Processing (AVSP 2003), Saint-Jorioz, France.

Sodoyer, D., Girin, L., Jutten, C., & Schwartz, J.L. (2003), Further experiments on audio-visual speech source separation, Proc. 5th Int. Conf. on Audio-Visual Speech Processing (AVSP 2003), St-Jorioz, France, pp. 145-150.

Sodoyer, D., Girin, L., Jutten, C., & Schwartz, J.L. (2003), Speech extraction based on ICA and audio-visual coherence. Proc. 7th International Symposium on Signal Processing and its Applications (ISSPA), Paris, France, pp. 65-68.

Sodoyer, D., Girin, L., Jutten, C., & Schwartz, J.L. (2003), Extracting an AV speech source from a mixture of signals, Proc. 8th European Conference on Speech Communication and Technology (Eurospeech '03), Geneva, Switzerland, pp 1393-1396.

Sodoyer, D., Girin, L., Jutten, C., & Schwartz, J.L. (2002). Audio-visual speech sources separation. 7th International Conference on Spoken Language Processing (ICSLP’02), Denver, USA, pp. 1953-1956.

Girin L., Allard, A. & Schwartz J.-L. (2001), Speech signals separation : a new approach exploiting the coherence of audio and visual speech, IEEE Int. Workshop on Multimedia Signal Processing (MMSP’2001), Cannes, France.

Other research and/or administrative activities

Member of the Review Comity of the French journal Traitement du Signal since 1999.

Reviewer for international conferences: AVSP (AudioVisual Speech Processing), IEEE ISSPA (International Symposium on Signal Processing and its Applications), EUSIPCO (European Signal Processing Conference).

Elected member of the ICP Scientific Comity since January 2003.

Elected member of the INPG Recruiting Comity, Electrical Engineering Section, since 2001.

Academic activities

My position at the INPG/ENSERG involves a mandatory minimum teaching activity of about 200 hours/year (depending on the type of activity; This amount represents generally the time effectively spent with the students). The exact amount regarding specific activity since academic year 2000/01 is given in the following table. The supervision of ten graduate training periods should be added.

 

Teaching activity

Level/Year

2000/01

2001/02

2002/03

2003/04

Lecture in Signal compression (**)

Graduate

 

16

16

21

Lab works in Digital Signal Processing (design and implementation on DSPs of applications based on digital filters) (*)

Graduate

40

86

130

130

Lab works in Informatics (software and harware programming, C langage and assembler) (**)

Undergrad.

110

70

60

76

Lab works in Signal Processing

Undergrad.

20

35

40

35

Lab works in Electronics (basics)

Undergrad.

120

120

 

 

Lecture in Signal Processing Theory (*) (***)

Master (US)

Bachelor (PT)

22

22

22

22

60

Total

 

312

349

268

344

 

(*) I am fully responsible of this activity (elaboration of the lectures/lab works contents, students supervision, coordination with other teachers, evaluation, etc.)

(**) I am co-responsible of this activity with another teacher.

(***) Activity done at the University of Savoie, Master of Science “Information Technologies”, and at the Politecnico di Torino (Italy), Bachelor “Information Technologies”.