WorldCat Identities

Colotte, Vincent

Overview
Works: 7 works in 8 publications in 2 languages and 12 library holdings
Roles: Opponent, Author, Thesis advisor
Publication Timeline
.
Most widely held works by Vincent Colotte
The IFCASL Corpus of French and German Non-native and Native Read Speech by Jürgen Trouvain( )

1 edition published in 2017 in English and held by 2 WorldCat member libraries worldwide

Techniques d'analyse et de synthèse de la parole appliquées à l'apprentissage des langues by Vincent Colotte( Book )

2 editions published in 2002 in French and held by 2 WorldCat member libraries worldwide

Nowadays when exchanges between people are more and more international, foreign language grasp is becoming essential. The computer-assisted language learning seems to be a new stake. In particular, the improvement of oral comprehension constitutes one of keys to control a language. To improve intelligibility, I work out a first strategy based on selective slowing down of speech signal. The transitory parts - regions of high acoustic cue concentration - turns out to be privileged candidates to the slowing down. The detection of these regions is based on the computation of a coefficient which reflects spectrum variation rate. I work out a second strategy which enhances relevant events of speech, i.e. that its amplification improves intelligibility. This strategy is based on the preservation of phonetic contrasts, in particular between voiced and unvoiced consonants. Thus, I developed an algorithm of detection of unvoiced plosives and unvoiced fricatives from criteria on energy. Two experiments of perception have been carried out to validate these strategies of intelligibility improvement: the first, preliminary, with French listeners on American sentences and the second with foreign students (learning French as foreign language) on French sentences. At last, to modify the prosodic elements (rhythm, intensity, fundamental frequency), my work was based on PSOLA method (Pitch Synchronous OverLap and Add). I work out an algorithm of pitch marking and I improve the accuracy of synthesis method. These strategies are totally automatic and allow to improve intelligibility of speech signal in the framework of language learning
Designing a bilingual speech corpus for French and German language learners by Jürgen Trouvain( )

1 edition published in 2017 in English and held by 2 WorldCat member libraries worldwide

Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process by Camille Fauth( )

1 edition published in 2017 in English and held by 2 WorldCat member libraries worldwide

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic by Amal Houidhek( )

1 edition published in 2018 in English and held by 2 WorldCat member libraries worldwide

Synthèse audiovisuelle de la parole expressive : modélisation des émotions par apprentissage profond by Sara Dahmani( )

1 edition published in 2020 in French and held by 1 WorldCat member library worldwide

Les travaux de cette thèse portent sur la modélisation des émotions pour la synthèse audiovisuelle expressive de la parole à partir du texte. Aujourd'hui, les résultats des systèmes de synthèse de la parole à partir du texte sont de bonne qualité, toutefois la synthèse audiovisuelle reste encore une problématique ouverte et la synthèse expressive l'est encore d'avantage. Nous proposons dans le cadre de cette thèse une méthode de modélisation des émotions malléable et flexible, permettant de mélanger les émotions comme on mélange les teintes sur une palette de couleurs. Dans une première partie, nous présentons et étudions deux corpus expressifs que nous avons construits. La stratégie d'acquisition ainsi que le contenu expressif de ces corpus sont analysés pour valider leur utilisation à des fins de synthèse audiovisuelle de la parole. Dans une seconde partie, nous proposons deux architectures neuronales pour la synthèse de la parole. Nous avons utilisé ces deux architectures pour modéliser trois aspects de la parole : 1) les durées des sons, 2) la modalité acoustique et 3) la modalité visuelle. Dans un premier temps, nous avons adopté une architecture entièrement connectée. Cette dernière nous a permis d'étudier le comportement des réseaux de neurones face à différents descripteurs contextuels et linguistiques. Nous avons aussi pu analyser, via des mesures objectives, la capacité du réseau à modéliser les émotions. La deuxième architecture neuronale proposée est celle d'un auto-encodeur variationnel. Cette architecture est capable d'apprendre une représentation latente des émotions sans utiliser les étiquettes des émotions. Après analyse de l'espace latent des émotions, nous avons proposé une procédure de structuration de ce dernier pour pouvoir passer d'une représentation par catégorie vers une représentation continue des émotions. Nous avons pu valider, via des expériences perceptives, la capacité de notre système à générer des émotions, des nuances d'émotions et des mélanges d'émotions, et cela pour la synthèse audiovisuelle expressive de la parole à partir du texte
Synthèse paramétrique de la parole Arabe by Amal Houidhek( )

1 edition published in 2020 in French and held by 1 WorldCat member library worldwide

The presented thesis deals with the adaptation of the conversion of a written text into speech using a parametric approach to the Arabic language. Different methods have been developed in order to set up synthesis systems. These methods are based on a description of the speech signal by a set of parameters. Besides, each sound is represented by a set of contextual features containing all the information affecting the pronunciation of this sound. Part of these features depend on the language and its peculiarities, so in order to adapt the parametric synthesis approach to Arabic, a study of its phonological peculiarities wasneeded. Two phenomena were identified : the gemination and the vowels quantity (short/ long). Two features associated to these phenomena have been added to the contextual features set. In the same way, different approaches have been proposed to model The geminated consonants and the long vowels of the speech units. Four combinations of modeling are possible : alternating the differentiation or fusion of simple and geminated consonants on the one hand and short and long vowels on the other hand. A set of perceptual and objective tests was conducted to evaluate the effect of the fourunit modelling approaches on the quality of the generated speech. The evaluations were made in the case of parametric synthesis by HMM then in the case of parametric synthesisby DNN. The subjective results showed that when the HMM approach is used, the four approaches produce signals with a similar quality, this result that was confirmed by the objective measures calculated to evaluate the prediction of the durations of the speech units. However, the results of objective evaluations in the case of the DNN approach have shown that the differentiation of simple consonants (respectively short vowels) geminated consonants (respectively long vowels) leads to a slightly better prediction of the durations than the other modelling approaches. On the other hand, this improvement was not perceived during the perceptive tests ; listeners found that the signals generated by the four approaches are similar in terms of overall quality. The last part of this thesis was devoted to the comparison of the synthesis approach by the HMMs to that by the DNNs.All the tests conducted have shown that the use of DNNs has improved the perceived quality of the generated signals
 
Audience Level
0
Audience Level
1
  Kids General Special  
Audience level: 0.96 (from 0.92 for Techniques ... to 0.97 for Techniques ...)

Languages