WorldCat Identities

Jouvet, Denis (1956-....).

Works: 15 works in 19 publications in 2 languages and 26 library holdings
Roles: Other, Author, Thesis advisor, Opponent
Publication Timeline
Most widely held works by Denis Jouvet
Reconnaissance de mots connectes indépendamment du locuteur par des méthodes statistiques by Denis Jouvet( Book )

5 editions published in 1988 in French and held by 7 WorldCat member libraries worldwide

Les caractéristiques principales du système développe sont la représentation de l'ensemble des phrases de l'application par un réseau, obtenu en compilant toutes les connaissances a priori de l'application: syntaxe, descriptions phonétiques, règles phonologiques etc; et l'emploi de densités de probabilité gaussiennes associées aux transitions
The IFCASL Corpus of French and German Non-native and Native Read Speech by Jürgen Trouvain( )

1 edition published in 2017 in English and held by 2 WorldCat member libraries worldwide

Prédiction de performances des systèmes de Reconnaissance Automatique de la Parole by Zied Elloumi( )

1 edition published in 2019 in French and held by 2 WorldCat member libraries worldwide

In this thesis, we focus on performance prediction of automatic speech recognition (ASR) systems.This is a very useful task to measure the reliability of transcription hypotheses for a new data collection, when the reference transcription is unavailable and the ASR system used is unknown (black box).Our contribution focuses on several areas: first, we propose a heterogeneous French corpus to learn and evaluate ASR prediction systems.We then compare two prediction approaches: a state-of-the-art (SOTA) performance prediction based on engineered features and a new strategy based on learnt features using convolutional neural networks (CNNs).While the joint use of textual and signal features did not work for the SOTA system, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably predicts the shape of the WER distribution on a collection of speech recordings.Then, we analyze factors impacting both prediction approaches. We also assess the impact of the training size of prediction systems as well as the robustness of systems learned with the outputs of a particular ASR system and used to predict performance on a new data collection.Our experimental results show that both prediction approaches are robust and that the prediction task is more difficult on short speech turns as well as spontaneous speech style.Finally, we try to understand which information is captured by our neural model and its relation with different factors.Our experiences show that intermediate representations in the network automatically encode information on the speech style, the speaker's accent as well as the broadcast program type.To take advantage of this analysis, we propose a multi-task system that is slightly more effective on the performance prediction task
Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process by Camille Fauth( )

1 edition published in 2017 in English and held by 2 WorldCat member libraries worldwide

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic by Amal Houidhek( )

1 edition published in 2018 in English and held by 2 WorldCat member libraries worldwide

Designing a bilingual speech corpus for French and German language learners by Jürgen Trouvain( )

1 edition published in 2017 in English and held by 2 WorldCat member libraries worldwide

Reconnaissance et traduction automatique de la parole de vidéos arabes et dialectales by Mohamed Amine Menacer( )

1 edition published in 2020 in French and held by 1 WorldCat member library worldwide

This research has been developed in the framework of the project AMIS (Access to Multilingual Information and opinionS). AMIS is an European project which aims to help people to understand the main idea of a video in a foreign language by generating an automatic summary of it. In this thesis, we focus on the automatic recognition and translation of the speech of Arabic and dialectal videos. The statistical approaches proposed in the literature for automatic speech recognition are language independent and they are applicable to modern standard Arabic. However, this language presents some characteristics that we need to take into consideration in order to boost the performance of the speech recognition system. Among these characteristics we can mention the absence of short vowels in the text, which makes their training by the acoustic model difficult. We proposed several approaches to acoustic and/or language modeling in order to better recognize the Arabic speech. In the Arab world, modern standard Arabic is not the mother tongue, that is why daily conversations are carried out with dialect, an Arabic inspired from modern standard Arabic, but not only. We worked on the adaptation of the speech recognition system developed for the modern standard Arabic to the Algerian dialect, which is one of the most difficult variants of the Arabic language to recognize by automatic speech recognition systems. This is mainly due to the borrowed words from other languages, the code-switching and the lack of resources. Our approach to overcome all these problems is to take advantage from oral and textual data of other languages that have an impact on the dialect in order to train the required models for dialect speech recognition. The resulting text from Arabic speech recognition system was then used for machine translation. As a starting point, we conducted a comparative study between the phrase based approach and the neural approach used in machine translation. Then, we adapted these two approaches to translate the code-switched text. Our study focused on the mix of Arabic and English in a parallel corpus extracted from official documents of the United Nations. In order to prevent the error propagation in the pipeline system, we worked on the adaptation of the vocabulary of the automatic speech recognition system and on the proposition of a new model that directly transforms a speech signal in language A into a sequence of words in another language B
Reconnaissance de la parole pour l'aide à la communication pour les sourds et malentendants by Luiza Orosanu( )

1 edition published in 2015 in French and held by 1 WorldCat member library worldwide

This thesis is part of the RAPSODIE project which aims at proposing a speech recognition device specialized on the needs of deaf and hearing impaired people. Two aspects are studied: optimizing the lexical models and extracting para-lexical information. Regarding the lexical modeling, we studied hybrid language models combining words and syllables, and we proposed a new approach based on a similarity measure between words to add new words in the language model. Regarding the extraction of para-lexical information, we investigated the use of prosodic features, of linguistic features and of their combination for the detection of questions and statements. This detection aims to inform the deaf and hearing impaired people when a question is addressed to them
Synthèse paramétrique de la parole Arabe by Amal Houidhek( )

1 edition published in 2020 in French and held by 1 WorldCat member library worldwide

The presented thesis deals with the adaptation of the conversion of a written text into speech using a parametric approach to the Arabic language. Different methods have been developed in order to set up synthesis systems. These methods are based on a description of the speech signal by a set of parameters. Besides, each sound is represented by a set of contextual features containing all the information affecting the pronunciation of this sound. Part of these features depend on the language and its peculiarities, so in order to adapt the parametric synthesis approach to Arabic, a study of its phonological peculiarities wasneeded. Two phenomena were identified : the gemination and the vowels quantity (short/ long). Two features associated to these phenomena have been added to the contextual features set. In the same way, different approaches have been proposed to model The geminated consonants and the long vowels of the speech units. Four combinations of modeling are possible : alternating the differentiation or fusion of simple and geminated consonants on the one hand and short and long vowels on the other hand. A set of perceptual and objective tests was conducted to evaluate the effect of the fourunit modelling approaches on the quality of the generated speech. The evaluations were made in the case of parametric synthesis by HMM then in the case of parametric synthesisby DNN. The subjective results showed that when the HMM approach is used, the four approaches produce signals with a similar quality, this result that was confirmed by the objective measures calculated to evaluate the prediction of the durations of the speech units. However, the results of objective evaluations in the case of the DNN approach have shown that the differentiation of simple consonants (respectively short vowels) geminated consonants (respectively long vowels) leads to a slightly better prediction of the durations than the other modelling approaches. On the other hand, this improvement was not perceived during the perceptive tests ; listeners found that the signals generated by the four approaches are similar in terms of overall quality. The last part of this thesis was devoted to the comparison of the synthesis approach by the HMMs to that by the DNNs.All the tests conducted have shown that the use of DNNs has improved the perceived quality of the generated signals
Reducing development costs of large vocabulary speech recognition systems by Thiago Fraga Da Silva( )

1 edition published in 2014 in English and held by 1 WorldCat member library worldwide

Au long des dernières décennies, des importants avancements ont été réalisés dans le domaine de la reconnaissance de la parole à grand vocabulaire. Un des défis à relever dans le domaine concerne la réduction des coûts de développement nécessaires pour construire un nouveau système ou adapter un système existant à une nouvelle tâche, langue ou dialecte. Les systèmes de reconnaissance de la parole à l'état de l'art sont basés sur les principes de l'apprentissage statistique, utilisant l'information fournie par deux modèles stochastiques, un modèle acoustique (MA) et un modèle de langue (ML). Les méthodes standards utilisées pour construire ces modèles s'appuient sur deux hypothèses de base : les jeux de données d'apprentissage sont suffisamment grands, et les données d'apprentissage correspondent bien à la tâche cible. Il est bien connu qu'une partie importante des coûts de développement est dû à la préparation des corpora qui remplissent ces deux conditions, l'origine principale des coûts étant la transcription manuelle des données audio. De plus, pour certaines applications, notamment la reconnaissance des langues et dialectes dits "peu dotés", la collecte des données est en soi une mission difficile. Cette thèse a pour but d'examiner et de proposer des méthodes visant à réduire le besoin de transcriptions manuelles des données audio pour une tâche donnée. Deux axes de recherche ont été suivis. Dans un premier temps, des méthodes d'apprentissage dits "non-supervisées" sont explorées. Leur point commun est l'utilisation des transcriptions audio obtenues automatiquement à l'aide d'un système de reconnaissance existant. Des méthodes non-supervisées sont explorées pour la construction de trois des principales composantes des systèmes de reconnaissance. D'abord, une nouvelle méthode d'apprentissage non-supervisée des MAs est proposée : l'utilisation de plusieurs hypothèses de décodage (au lieu de la meilleure uniquement) conduit à des gains de performance substantiels par rapport à l'approche standard. L'approche non-supervisée est également étendue à l'estimation des paramètres du réseau de neurones (RN) utilisé pour l'extraction d'attributs acoustiques. Cette approche permet la construction des modèles acoustiques d'une façon totalement non-supervisée et conduit à des résultats compétitifs en comparaison avec des RNs estimés de façon supervisée. Finalement, des méthodes non-supervisées sont explorées pour l'estimation des MLs à repli (back-off ) standards et MLs neuronaux. Il est montré que l'apprentissage non-supervisée des MLs conduit à des gains de performance additifs (bien que petits) à ceux obtenus par l'apprentissage non-supervisée des MAs. Dans un deuxième temps, cette thèse propose l'utilisation de l'interpolation de modèles comme une alternative rapide et flexible pour la construction des MAs pour une tâche cible. Les modèles obtenus à partir d'interpolation se montrent plus performants que les modèles de base, notamment ceux estimés à échantillons regroupés ou ceux adaptés à la tâche cible. Il est montré que l'interpolation de modèles est particulièrement utile pour la reconnaissance des dialectes peu dotés. Quand la quantité de données d'apprentissage acoustiques du dialecte ciblé est petite (2 à 3 heures) ou même nulle, l'interpolation des modèles conduit à des gains de performances considérables par rapport aux méthodes standards
Traitement de l'incertitude pour la reconnaissance de la parole robuste au bruit by Dung Tien Tran( )

1 edition published in 2015 in English and held by 1 WorldCat member library worldwide

Cette thèse se focalise sur la reconnaissance automatique de la parole (RAP) robuste au bruit. Elle comporte deux parties. Premièrement, nous nous focalisons sur une meilleure prise en compte des incertitudes pour améliorer la performance de RAP en environnement bruité. Deuxièmement, nous présentons une méthode pour accélérer l'apprentissage d'un réseau de neurones en utilisant une fonction auxiliaire. Dans la première partie, une technique de rehaussement multicanal est appliquée à la parole bruitée en entrée. La distribution a posteriori de la parole propre sous-jacente est alors estimée et représentée par sa moyenne et sa matrice de covariance, ou incertitude. Nous montrons comment propager la matrice de covariance diagonale de l'incertitude dans le domaine spectral à travers le calcul des descripteurs pour obtenir la matrice de covariance pleine de l'incertitude sur les descripteurs. Le décodage incertain exploite cette distribution a posteriori pour modifier dynamiquement les paramètres du modèle acoustique au décodage. La règle de décodage consiste simplement à ajouter la matrice de covariance de l'incertitude à la variance de chaque gaussienne. Nous proposons ensuite deux estimateurs d'incertitude basés respectivement sur la fusion et sur l'estimation non-paramétrique. Pour construire un nouvel estimateur, nous considérons la combinaison linéaire d'estimateurs existants ou de fonctions noyaux. Les poids de combinaison sont estimés de façon générative en minimisant une mesure de divergence par rapport à l'incertitude oracle. Les mesures de divergence utilisées sont des versions pondérées des divergences de Kullback-Leibler (KL), d'Itakura-Saito (IS) ou euclidienne (EU). En raison de la positivité inhérente de l'incertitude, ce problème d'estimation peut être vu comme une instance de factorisation matricielle positive (NMF) pondérée. De plus, nous proposons deux estimateurs d'incertitude discriminants basés sur une transformation linéaire ou non linéaire de l'incertitude estimée de façon générative. Cette transformation est entraînée de sorte à maximiser le critère de maximum d'information mutuelle boosté (bMMI). Nous calculons la dérivée de ce critère en utilisant la règle de dérivation en chaîne et nous l'optimisons par descente de gradient stochastique. Dans la seconde partie, nous introduisons une nouvelle méthode d'apprentissage pour les réseaux de neurones basée sur une fonction auxiliaire sans aucun réglage de paramètre. Au lieu de maximiser la fonction objectif, cette technique consiste à maximiser une fonction auxiliaire qui est introduite de façon récursive couche par couche et dont le minimum a une expression analytique. Grâce aux propriétés de cette fonction, la décroissance monotone de la fonction objectif est garantie
Réseaux de neurones récurrents pour le traitement automatique de la parole by Grégory Gelly( )

1 edition published in 2017 in French and held by 1 WorldCat member library worldwide

Le domaine du traitement automatique de la parole regroupe un très grand nombre de tâches parmi lesquelles on trouve la reconnaissance de la parole, l'identification de la langue ou l'identification du locuteur. Ce domaine de recherche fait l'objet d'études depuis le milieu du vingtième siècle mais la dernière rupture technologique marquante est relativement récente et date du début des années 2010. C'est en effet à ce moment qu'apparaissent des systèmes hybrides utilisant des réseaux de neurones profonds (DNN) qui améliorent très notablement l'état de l'art. Inspirés par le gain de performance apporté par les DNN et par les travaux d'Alex Graves sur les réseaux de neurones récurrents (RNN), nous souhaitions explorer les capacités de ces derniers. En effet, les RNN nous semblaient plus adaptés que les DNN pour traiter au mieux les séquences temporelles du signal de parole. Dans cette thèse, nous nous intéressons tout particulièrement aux RNN à mémoire court-terme persistante (Long Short Term Memory (LSTM) qui permettent de s'affranchir d'un certain nombre de difficultés rencontrées avec des RNN standards. Nous augmentons ce modèle et nous proposons des processus d'optimisation permettant d'améliorer les performances obtenues en segmentation parole/non-parole et en identification de la langue. En particulier, nous introduisons des fonctions de coût dédiées à chacune des deux tâches: un simili-WER pour la segmentation parole/non-parole dans le but de diminuer le taux d'erreur d'un système de reconnaissance de la parole et une fonction de coût dite de proximité angulaire pour les problèmes de classification multi-classes tels que l'identification de la langue parlée
La construction automatique de ressources multilingues à partir des réseaux sociaux : application aux données dialectales du Maghreb by Karima Abidi( )

1 edition published in 2019 in French and held by 1 WorldCat member library worldwide

Automatic language processing is based on the use of language resources such as corpora, dictionaries, lexicons of sentiments, morpho-syntactic analyzers, taggers, etc. For natural languages, these resources are often available. On the other hand, when it comes to dealing with under-resourced languages, there is often a lack of tools and data. In this thesis, we are interested in some of the vernacular forms of Arabic used in Maghreb. These forms are known as dialects, which can be classified as poorly endowed languages. Except for raw texts, which are generally extracted from social networks, there is not plenty resources allowing to process Arabic dialects. The latter, compared to other under-resourced languages, have several specificities that make them more difficult to process. We can mention, in particular the lack of rules for writing these dialects, which leads the users to write the dialect without following strict rules, so the same word can have several spellings. Words in Arabic dialect can be written using the Arabic script and/or the Latin script (arabizi). For the Arab dialects of the Maghreb, they are particularly impacted by foreign languages such as French and English. In addition to the borrowed words from these languages, another phenomenon must be taken into account in automatic dialect processing. This is the problem known as code- switching. This phenomenon is known in linguistics as diglossia. This gives free rein to the user who can write in several languages in the same sentence. He can start in Arabic dialect and in the middle of the sentence, he can switch to French, English or modern standard Arabic. In addition to this, there are several dialects in the same country and a fortiori several different dialects in the Arab world. It is therefore clear that the classic NLP tools developed for modern standard Arabic cannot be used directly to process dialects. The main objective of this thesis is to propose methods to build automatically resources for Arab dialects in general and more particularly for Maghreb dialects. This represents our contribution to the effort made by the community working on Arabic dialects. We have thus produced methods for building comparable corpora, lexical resources containing the different forms of an input and their polarity. In addition, we developed methods for processing modern standard Arabic on Twitter data and also on transcripts from an automatic speech recognition system operating on Arabic videos extracted from Arab television channels such as Al Jazeera, France24, Euronews, etc. We compared the opinions of automatic transcriptions from different multilingual video sources related to the same subject by developing a method based on linguistic theory called Appraisal
Structuration du modèle acoustique pour améliorer les performance de reconnaissance automatique de la parole by Arseniy Gorin( )

1 edition published in 2014 in English and held by 1 WorldCat member library worldwide

This thesis focuses on acoustic model structuring for improving HMM-Based automatic speech recognition. The structuring relies on unsupervised clustering of speech utterances of the training data in order to handle speaker and channel variability. The idea is to split the data into acoustically similar classes. In conventional multi-Modeling (or class-Based) approach, separate class-Dependent models are built via adaptation of a speaker-Independent model. When the number of classes increases, less data becomes available for the estimation of the class-Based models, and the parameters are less reliable. One way to handle such problem is to modify the classification criterion applied on the training data, allowing a given utterance to belong to more than one class. This is obtained by relaxing the classification decision through a soft margin. This is investigated in the first part of the thesis. In the main part of the thesis, a novel approach is proposed that uses the clustered data more efficiently in a class-Structured GMM. Instead of adapting all HMM-GMM parameters separately for each class of data, the class information is explicitly introduced into the GMM structure by associating a given density component with a given class. To efficiently exploit such structured HMM-GMM, two different approaches are proposed. The first approach combines class-Structured GMM with class-Dependent mixture weights. In this model the Gaussian components are shared across speaker classes, but they are class-Structured, and the mixture weights are class-Dependent. For decoding an utterance, the set of mixture weights is selected according to the estimated class. In the second approach, the mixture weights are replaced by density component transition probabilities. The approaches proposed in the thesis are analyzed and evaluated on various speech data, which cover different types of variability sources (age, gender, accent and noise)
Composition sémantique pour la langue orale by Frédéric Duvert( )

1 edition published in 2010 in French and held by 1 WorldCat member library worldwide

The thesis presented here is intended to provide detection systems, composition of components and semantic interpretation in the natural spoken language understanding. This understanding is based on an automatic speech recognition system that translates the signals into oral statements used by the machine. The transcribed speech signal, contains a series of errors related to recognition errors (noise, poor pronunciation...). The interpretation of this statement is difficult because it is derived from a spoken discourse, subject to the disfluency of speech, forself-correction... The statement is more ungrammatical, because the spoken discourse itself is ungrammatical. The application of grammatical analysis methods do not produce good results interpretation, on the outcome of speech transcription. The use of deep syntactic analysis methods should be avoided. Thus, a superficial analysis is considered. A primary objective is to provide a representation of meaning. It is considered ontologies to conceptualize the world we describe. We can express the semantic components in first order logic with predicates. In the work described here, we represent the semantic elements by frames (FrameNet ). The frames are hierarchical structures, and are fragments of knowledge which can be inserted, merge or infer other fragments of knowledge. The frames are differentiable structures in logical formulas. We propose a system for speech understanding from logical rules with the support of an ontology in order to create links from semantic components. Then, we conducted a study on the discovery supports syntactic semantic relationships. We propose a compositional semantics experience to enrich the basic semantic components. Finally, we present a detection system for lambda-expression hypothesis to find the relationship through discourse
moreShow More Titles
fewerShow Fewer Titles
Audience Level
Audience Level
  Kids General Special  
Audience level: 0.96 (from 0.94 for Reconnaiss ... to 0.99 for Réseaux d ...)

French (12)

English (7)