
Born 09-26 1969 in Moutiers (Savoie-France)
Address: 47 rue des Alliés –
38000 Grenoble, France
Personal cellular phone: (+33)6 62
82 88 33
Institut
de la Communication Parlée (Speech Communication Lab.)
Address: 46 avenue Félix Viallet - 38031 Grenoble
Cedex 1
Phone: 04 76 57 47 15 – E-mail : girin@icp.inpg.fr – Web :
http://www.icp.inpg.fr/~girin/
Institut
National Polytechnique de Grenoble (INPG) (Engineering University)
Ecole Nationale Supérieure d'Electronique et de Radioélectricité de Grenoble
(ENSERG)
(Department
of Electronics and Telecommunications of the INPG)
Address: 23 rue des Martyrs - BP257 - 38016 Grenoble
From 1999 Associate Professor at the
INPG/ENSERG / Speech Scientist at the ICP
1997-99
Assistant Teacher at the INPG/ENSERG
/ Speech Scientist at the ICP
1994-97 Ph.D of the INPG – Cum laude – INPG Best Ph.D Award 1997
1994 Master of Science – INPG – Signals, Speech and Image
The aim of this work is to develop speech processing
techniques and systems able to exploit the natural coherence and
complementarity between the acoustic and visual speech signals to improve
existing processes generally based on the acoustic signal alone (e.g.
enhancement in noise). The visual signals consist in recorded movements of the
visible articulators of speech, especially the lips, and eventually the jaw and
cheeks. The main applications concern :
-
Speech
enhancement in noisy environment: The information contained in lip movements can be used to improve the
intelligibility of speech in noise. In (Girin et al., 2001), Wiener
LPC-based filters were estimated partly from speaker’s lip shape parameters and
partly from the noisy signal. These filters were used to enhance the
intelligibility of acoustic speech.
-
Audiovisual
speech sources separation:
This problem can be seen as the extension of the enhancement problem when
several sources of speech and noise are mixed and the mixtures are obtained
from several sensors. In the audiovisual case, an additional video sensor must
track the lip movements of a specific speaker. Thus, an algorithm based on the
maximization of audio and visual features correlation must allow to extract the
corresponding speech signal. We first laid the foundations of the problem in
the case of additive mixtures (Girin et al., 2001; Sodoyer et al.,
2002a, 2002b, 2003a, 2003b, 2003c, to appear 2005) and we are now dealing with
the more complex case of convolutive mixtures (Rivet et al., 2004, to
appear 2005).
-
Audiovisual
speech coding: The aim is here
to exploit the audio and visual features coherence (seen as a redundancy) to
decrease the bit-rate and/or the complexity of a joint audiovisual speech
coders (compared to standard separate audio and video coders). Different
structures of lips parameters and LPC coefficients coders based on concatenated
or cascaded audiovisual vector/matrix quantization have been proposed and
tested in (Girin, 2004).
Cued Speech analysis modeling and synthesis
Since the beginning of 2003, I am involved in studies
on Cued Speech and its integration in telecommunication systems. Cued Speech is a language/method used by a part of the hearing impaired people
community that is orally educated
(but not necessarily able to pronounce speech correctly). It is designed to
complement speech lipreading by the association of lip shapes with cues formed
by both the shape of the hand and its placement at specific locations around
the face.
The ICP is currently involved in a national project called ARTUS which
aim is to allow hearing impaired people to replace the television broadcasts
subtitles by a synthesis clone delivering the subtitles information in Cued
Speech. (see http://www.lis.inpg.fr/la_recherche/artus/artus.htm).
In this project, I am in charge of the task of modeling and encoding/decoding
the trajectories of the clone’s face and hand articulators, with a very low
bit-rate constraint (the transmission is ensured by a very small bandwidth
watermarking technique).
Besides, I am the co-conceiver and co-leader of a recent
multi-laboratories project called TELMA that is supported by the INPG, which
aim is to elaborate an automatic system for translating “mute” Cued Speech into
acoustic speech (by video analysis, parameters conversion and/or recognition,
see (Burger et al., 2004), and
acoustic speech synthesis) an vice-versa (by speech analysis, conversion and
video animated clone synthesis). Such system is dedicated to allow interactive
and real-time communication between Cued Speech hearing impaired users and
normal hearing users. The project includes to study the feasibility of its
integration in an autonomous telecommunication terminal, e.g. a cellular phone
with video functionalities.
Since 2002, I have initiated at the ICP and also in collaboration with
Sylvain Marchand from the LaBRI/University of Bordeaux, a series of works
dealing with new developments and new applications in the framework of
“classical” models of speech and audio signals like, e.g., the sinusoidal
model. One of the major aims is to elaborate new efficient models of the
trajectories in time of the underlying speech models parameters (e.g., models
of amplitude and phase trajectories of the sinusoidal model of speech). The
applications include high-quality synthesis (Girin et al., 2003), good
quality / very low bit-rate speech coding, speech transformation and speech
watermarking (Girin & Marchand, 2004). A major challenge that is at the
heart of our current works consists in the elaboration of long-term models that
can describe efficiently the dynamics of the speech parameters over long
sections of speech, exploring largely beyond the usual frame-by-frame
modeling/synthesis approach (Girin et al., 2004; Girin & Firouzmand,
to appear 2005).
I am currently full advisor of one PhD (student: Mohammad Firouzmand)
and co-advisor of another one (student: Bertrand Rivet), both not already
achieved. I have been co-advisor of a third one, achieved in 2004 (student:
David Sodoyer).
I have been full advisor of four six-months training periods of graduate
students (Master of Science), and co-advisor of six other ones.
Main publications (from 2001)
International journals
Sodoyer,
D., Girin, L., Jutten, C., & Schwartz, J.-L. (to appear 2005), Developing
an audio-visual speech source separation algorithm, Speech Communication,
In Press, Corrected Proof, Available online.
Girin,
L. (2004), Joint matrix quantization of face parameters and LPC coefficients for
low bit rate audiovisual speech coding, IEEE Transactions on Speech and
Audio Processing, 12(3),
pp. 265-276.
Sodoyer, D., Schwartz, J.-L., Girin, L., Klinkisch,
J., & Jutten, C. (2002), Separation of audio-visual speech sources: A new
approach exploiting the audiovisual coherence of speech stimuli, Eurasip
Journal on Applied Signal Processing, 2002(11), pp.1165-1173.
Girin,
L., Schwartz, J.-L. & Feng, G. (2001), Audio-visual enhancement of speech
in noise, Journal of the Acoustical
Society of America, 109(6), pp. 3007-3020.
International conferences
Girin,
L. & Firouzmand, M. (to appear 2005), Perceptually weighted long term
modeling of sinusoidal speech amplitude trajectories, Accepted at Int. Conf on
Acoustics, Speech & Signal Processing (ICASSP 2005), Philadelphia, USA.
Rivet
B., Girin, L. & Jutten, C. (to appear 2005), Solving the indeterminations
of blind source separation of convolutive speech mixtures, Accepted at Int. Conf on
Acoustics, Speech & Signal Processing (ICASSP 2005), Philadelphia, USA.
Girin,
L., Firouzmand, M. & Marchand, S. (2004), Long term modeling of phase
trajectories within the speech sinusoidal model framework, Proc. Int. Conf. on Speech &
Language Proc. (ICSLP 2004), Jeju,
South Korea.
Burger,
T., Beautemps, D. & Girin, L. (2004), Characterizing and classifying cued
speech vowels from labial parameters, Proc. Int. Conf. on Speech & Language Proc. (ICSLP
2004), Jeju, South Korea.
Rivet, B., Girin, L.,
Jutten, C. & Schwartz, J.-L. (2004), Using audiovisual speech processing to
improve the robustness of the separation of convolutive speech mixtures, IEEE Int. Workshop on Multimedia Signal
Processing (MMSP 2004), Siena, Italy.
Girin, L. & Marchand,
S. (2004), Watermarking of speech signals using the sinusoidal model and
frequency modulation of the partials, Proc. Int.
Conf. on Acoustics, Speech & Signal Processing (ICASSP 2004), Montréal, Quebec.
Girin, L., Marchand, S., di
Martino, J., Röbel, A. & Peeters, G. (2003), Comparing the order of a
polynomial phase model for the synthesis of quasi-harmonic audio signals, Proc.
IEEE Int. Workshop on Signal Processing and its Applications to Audio and
Acoustics (WASPAA 2003), New Paltz, USA.
Girin, L. (2003), Pure
audio McGurk effect, Proc. 5th Int. Conf. on Audio-Visual Speech
Processing (AVSP 2003), Saint-Jorioz, France.
Sodoyer, D., Girin, L., Jutten, C., & Schwartz,
J.L. (2003), Further experiments on audio-visual speech source separation, Proc.
5th Int. Conf. on Audio-Visual Speech Processing (AVSP 2003), St-Jorioz,
France, pp. 145-150.
Sodoyer, D., Girin, L., Jutten, C., & Schwartz,
J.L. (2003), Speech extraction based on ICA and audio-visual coherence. Proc.
7th International Symposium on Signal Processing and its Applications (ISSPA),
Paris, France, pp. 65-68.
Sodoyer, D., Girin, L., Jutten, C., & Schwartz,
J.L. (2003), Extracting an AV speech source from a mixture of signals, Proc.
8th European Conference on Speech Communication and Technology (Eurospeech '03),
Geneva, Switzerland, pp 1393-1396.
Sodoyer, D., Girin, L.,
Jutten, C., & Schwartz, J.L. (2002). Audio-visual speech sources
separation. 7th International Conference on Spoken Language Processing (ICSLP’02),
Denver, USA, pp. 1953-1956.
Girin L., Allard, A. &
Schwartz J.-L. (2001), Speech signals separation : a new approach exploiting
the coherence of audio and visual speech, IEEE
Int. Workshop on Multimedia Signal Processing (MMSP’2001), Cannes, France.
Other research and/or administrative activities
Member of the Review Comity of the French journal Traitement du Signal since 1999.
Reviewer for international conferences: AVSP (AudioVisual
Speech Processing), IEEE ISSPA (International Symposium on Signal
Processing and its Applications), EUSIPCO (European Signal Processing
Conference).
Elected member of the ICP Scientific Comity since
January 2003.
Elected
member of the INPG Recruiting Comity, Electrical Engineering Section, since
2001.
Academic activities
My
position at the INPG/ENSERG involves a mandatory minimum teaching activity of
about 200 hours/year (depending on the type of activity; This amount represents
generally the time effectively spent with the students). The exact amount
regarding specific activity since academic year 2000/01 is given in the
following table. The supervision of ten graduate training periods should be
added.
|
Teaching
activity |
Level/Year |
2000/01 |
2001/02 |
2002/03 |
2003/04 |
|
Lecture
in Signal compression (**) |
Graduate |
|
16 |
16 |
21 |
|
Lab
works in Digital Signal Processing (design and implementation on DSPs of
applications based on digital filters) (*) |
Graduate |
40 |
86 |
130 |
130 |
|
Lab
works in Informatics (software and harware programming, C langage and
assembler) (**) |
Undergrad. |
110 |
70 |
60 |
76 |
|
Lab
works in Signal Processing |
Undergrad. |
20 |
35 |
40 |
35 |
|
Lab
works in Electronics (basics) |
Undergrad. |
120 |
120 |
|
|
|
Lecture
in Signal Processing Theory (*) (***) |
Master (US) Bachelor (PT) |
22 |
22 |
22 |
22 60 |
|
Total
|
|
312 |
349 |
268 |
344 |
(*) I am fully responsible of this activity
(elaboration of the lectures/lab works contents, students supervision,
coordination with other teachers, evaluation, etc.)
(**) I am
co-responsible of this activity with another teacher.
(***) Activity done at the University of Savoie,
Master of Science “Information Technologies”, and at the Politecnico di Torino
(Italy), Bachelor “Information Technologies”.