WorldCat Identities

Precioso, Frédéric

Overview
Works: 40 works in 59 publications in 2 languages and 510 library holdings
Roles: Opponent, Other, Thesis advisor, Author, htt
Publication Timeline
.
Most widely held works by Frédéric Precioso
Visual indexing and retrieval by Jenny Benois Pineau( )

17 editions published in 2012 in English and held by 466 WorldCat member libraries worldwide

The research in content-based indexing and retrieval of visual information such as images and video has become one of the most populated directions in the vast area of information technologies. Social networks such as YouTube, Facebook, FileMobile, and DailyMotion host and supply facilities for accessing a tremendous amount of professional and user generated data. The areas of societal activity, such as, video protection and security, also generate thousands and thousands of terabytes of visual content. This book presents the most recent results and important trends in visual information indexing and retrieval. It is intended for young researchers, as well as, professionals looking for an algorithmic solution to a problem
Boosted kernel for image categorization by Alexis Lechervy( )

1 edition published in 2013 in English and held by 2 WorldCat member libraries worldwide

Extraction et reconnaissance de primitives dans les façades de Paris à l'aide d'appariement de graphes by Jean-emmanuel Haugeard( Book )

2 editions published in 2010 in French and held by 2 WorldCat member libraries worldwide

This last decade, modeling of 3D city became one of the challenges of multimedia search and an important focus in object recognition. In this thesis we are interested to locate various primitive, especially the windows, in the facades of Paris. At first, we present an analysis of the facades and windows properties. Then we propose an algorithm able to extract automatically window candidates. In a second part, we discuss about extraction and recognition primitives using graph matching of contours. Indeed an image of contours is readable by the human eye, which uses perceptual grouping and makes distinction between entities present in the scene. It is this mechanism that we have tried to replicate. The image is represented as a graph of adjacency of segments of contours, valued by information orientation and proximity to edge segments. For the inexact matching of graphs, we propose several variants of a new similarity based on sets of paths, able to group several contours and robust to scale changes. The similarity between paths takes into account the similarity of sets of segments of contours and the similarity of the regions defined by these paths. The selection of images from a database containing a particular object is done using a KNN or SVM classifier
Apprentissage et exploitation de représentations sémantiques pour la classification et la recherche d'images by Maxime Bucher( )

2 editions published in 2018 in French and held by 2 WorldCat member libraries worldwide

In this thesis, we examine some practical difficulties of deep learning models.Indeed, despite the promising results in computer vision, implementing them in some situations raises some questions. For example, in classification tasks where thousands of categories have to be recognised, it is sometimes difficult to gather enough training data for each category.We propose two new approaches for this learning scenario, called <<zero-shot learning>>. We use semantic information to model classes which allows us to define models by description, as opposed to modelling from a set of examples.In the first chapter we propose to optimize a metric in order to transform the distribution of the original data and to obtain an optimal attribute distribution. In the following chapter, unlike the standard approaches of the literature that rely on the learning of a common integration space, we propose to generate visual features from a conditional generator. The artificial examples can be used in addition to real data for learning a discriminant classifier. In the second part of this thesis, we address the question of computational intelligibility for computer vision tasks. Due to the many and complex transformations of deep learning algorithms, it is difficult for a user to interpret the returned prediction. Our proposition is to introduce what we call a <<semantic bottleneck>> in the processing pipeline, which is a crossing point in which the representation of the image is entirely expressed with natural language, while retaining the efficiency of numerical representations. This semantic bottleneck allows to detect failure cases in the prediction process so as to accept or reject the decision
Analyse et interprétation de scènes visuelles par approches collaboratives by Sabin Tiberius Strat( )

1 edition published in 2013 in English and held by 2 WorldCat member libraries worldwide

Les dernières années, la taille des collections vidéo a connu une forte augmentation. La recherche et la navigation efficaces dans des telles collections demande une indexation avec des termes pertinents, ce qui nous amène au sujet de cette thèse, l'indexation sémantique des vidéos. Dans ce contexte, le modèle Sac de Mots (BoW), utilisant souvent des caractéristiques SIFT ou SURF, donne de bons résultats sur les images statiques. Notre première contribution est d'améliorer les résultats des descripteurs SIFT/SURF BoW sur les vidéos en pré-traitant les vidéos avec un modèle de rétine humaine, ce qui rend les descripteurs SIFT/SURF BoW plus robustes aux dégradations vidéo et qui leurs donne une sensitivité à l'information spatio-temporelle. Notre deuxième contribution est un ensemble de descripteurs BoW basés sur les trajectoires. Ceux-ci apportent une information de mouvement et contribuent vers une description plus riche des vidéos. Notre troisième contribution, motivée par la disponibilité de descripteurs complémentaires, est une fusion tardive qui détermine automatiquement comment combiner un grand ensemble de descripteurs et améliore significativement la précision moyenne des concepts détectés. Toutes ces approches sont validées sur les bases vidéo du challenge TRECVid, dont le but est la détection de concepts sémantiques visuels dans un contenu multimédia très riche et non contrôlé
Contours actifs paramétriques pour la segmentation d'images et vidéos by Frédéric Precioso( Book )

2 editions published in 2004 in French and held by 2 WorldCat member libraries worldwide

Cette thèse s'inscrit dans le cadre des modèles de contours actifs. Il s'agit de méthodes dynamiques appliquées à la segmentation d'image, en image fixe et vidéo. L'image est représentée par des descripteurs régions et/ou contours. La segmentation est traitée comme un problème de minimisation d'une fonctionnelle. La recherche du minimum se fait via la propagation d'un contour actif di basé régions. L'efficacité de ces méthodes réside surtout dans leur robustesse et leur rapidité. L'objectif de cette thèse est triple : le développement (i) d'une représentation paramétrique de courbes respectant certaines contraintes de régularités, (ii) les conditions nécessaires à une évolution stable de ces courbes et (iii) la réduction des coûts de calcul afin de proposer une méthode adaptée aux applications nécessitant une réponse en temps réel. Nous nous intéressons principalement aux contraintes de rigidité autorisant une plus grande robustesse vis-à-vis du bruit. Concernant l'évolution des contours actifs, nous étudions les problèmes d'application de la force de propagation, de la gestion de la topologie et des conditions de convergence. Nous avons fait le choix des courbes splines cubiques. Cette famille de courbes offre d'intéressantes propriétés de régularité, autorise le calcul exact des grandeurs différentielles qui interviennent dans la fonctionnelle et réduit considérablement le volume de données à traiter. En outre, nous avons étendu le modèle classique des splines d'interpolation à un modèle de splines d'approximation, dites smoothin splines. Ce dernier met en balance la contrainte de régularité et l'erreur d'interpolation sur les points d'échantillonnage du contour. Cette flexibilité permet ainsi de privilégier la précision ou la robustesse. L'implémentation de ces modèles de splines a prouvé son efficacité dans diverses applications de segmentation
Passage à l'échelle des systèmes de recommandation avec respect de la vie privée by Andrés Dario Moreno Barbosa( )

1 edition published in 2014 in English and held by 1 WorldCat member library worldwide

The main objective of this thesis is to propose a recommendation method that keeps in mind the privacy of users as well as the scalability of the system. To achieve this goal, an hybrid technique using content-based and collaborative filtering paradigms is used in order to attain an accurate model for recommendation, under the strain of mechanisms designed to keep user privacy, particularly designed to reduce the user exposure risk. The thesis contributions are threefold : First, a Collaborative Filtering model is defined by using client-side agent that interacts with public information about items kept on the recommender system side. Later, this model is extended into an hybrid approach for recommendation that includes a content-based strategy for content recommendation. Using a knowledge model based on keywords that describe the item domain, the hybrid approach increases the predictive performance of the models without much computational effort on the cold-start setting. Finally, some strategies to improve the recommender system's provided privacy are introduced: Random noise generation is used to limit the possible inferences an attacker can make when continually observing the interaction between the client-side agent and the server, and a blacklisted strategy is used to refrain the server from learning interactions that the user considers violate her privacy. The use of the hybrid model mitigates the negative impact these strategies cause on the predictive performance of the recommendations
Optimisation du suivi de personnes dans un réseau de caméras by Julien Badie( )

1 edition published in 2015 in English and held by 1 WorldCat member library worldwide

This thesis addresses the problem of improving the performance of people tracking process in a new framework called Global Tracker, which evaluates the quality of people trajectory (obtained by simple tracker) and recovers the potential errors from the previous stage. The first part of this Global Tracker estimates the quality of the tracking results, based on a statistical model analyzing the distribution of the target features to detect potential anomalies.To differentiate real errors from natural phenomena, we analyze all the interactions between the tracked object and its surroundings (other objects and background elements). In the second part, a post tracking method is designed to associate different tracklets (segments of trajectory) corresponding to the same person which were not associated by a first stage of tracking. This tracklet matching process selects the most relevant appearance features to compute a visual signature for each tracklet. Finally, the Global Tracker is evaluated with various benchmark datasets reproducing real-life situations, outperforming the state-of-the-art trackers
Vers un système perceptuel de reconnaissance d'objets by Dounia Awad( )

1 edition published in 2014 in French and held by 1 WorldCat member library worldwide

The main objective of this thesis is to propose a pipeline for an object recognition algorithm, near to human perception, and at the same time, address the problems of Content Based image retrieval (CBIR) algorithm complexity : query run time and memory allocation. In this context, we propose a filter based on visual attention system to select salient points according to human interests from the interest points extracted by a traditionnal interest points detectors. The test of our approach, using Perreira Da Silva's system as filter, on VOC 2005 databases, demonstrated that we can maintain approximately the same performance of a object recognition system by selecting only 40% of interest points (extracted by Harris-Laplace and Laplacian), while having an important gain in complexity (40% gain in query-run time and 60% in complexity). Furthermore, we address the problem of high dimensionality of descriptor in object recognition system. We proposed a new hybrid texture descriptor, representing the spatial frequency of some perceptual features extracted by a visual attention system. This descriptor has the advantage of being lower dimension vs. traditional descriptors. Evaluating our descriptor with an object recognition system (interest points detectors are Harris-Laplace & Laplacian) on VOC 2007 databases showed a slightly decrease in the performance (with 5% loss in Average Precision) compared to the original system, based on SIFT descriptor (with 50% complexity gain). In addition, we evaluated our descriptor using a visual attention system as interest point detector, on VOC 2005 databases. The experiment showed a slightly decrease in performance (with 3% loss in performance), meanwhile we reduced drastically the complexity of the system (with 50% gain in run-query time and 70% in complexity)
Une méthode hybride pour la classification d'images à grain fin by Romaric Pighetti( )

1 edition published in 2016 in English and held by 1 WorldCat member library worldwide

La quantité d'images disponible sur Internet ne fait que croître, engendrant un besoin d'algorithmes permettant de fouiller ces images et retrouver de l'information. Les systèmes de recherche d'images par le contenu ont été développées dans ce but. Mais les bases de données grandissant, de nouveaux défis sont apparus. Dans cette thèse, la classification à grain fin est étudiée en particulier. Elle consiste à séparer des images qui sont relativement semblables visuellement mais représentent différents concepts, et à regrouper des images qui sont différentes visuellement mais représentent le même concept. Il est montré dans un premier temps que les techniques classiques de recherche d'images par le contenu rencontrent des difficultés à effectuer cette tâche. Même les techniques utilisant les machines à vecteur de support (SVM), qui sont très performants pour la classification, n'y parviennent pas complètement. Ces techniques n'explorent souvent pas assez l'espace de recherche pour résoudre ce problème. D'autres méthodes, comme les algorithmes évolutionnaires sont également étudiées pour leur capacité à identifier des zones intéressantes de l'espace de recherche en un temps raisonnable. Toutefois, leurs performances restent encore limitées. Par conséquent, l'apport de la thèse consiste à proposer un système hybride combinant un algorithme évolutionnaire et un SVM a finalement été développé. L'algorithme évolutionnaire est utilisé pour construire itérativement un ensemble d'apprentissage pour le SVM. Ce système est évalué avec succès sur la base de données Caltech-256 contenant envieront 30000 images réparties en 256 catégories
Gestion du contrôle de la diffusion des données d'entreprises et politiques de contrôles d'accès by Yoann Bertrand( )

1 edition published in 2017 in English and held by 1 WorldCat member library worldwide

The main objective of this thesis is to solve the problem of unintentional data leakage within companies. These leaks can be caused by the use of both Access Control (AC) and Transmission Control (TC) policies. Moreover, using both AC and TC can lead to many problems for the security experts and the administrators that are in charge of the definition and maintenance of such policies. Among these problems, we can underline the genericity problem of existing models, the coherence problem between AC and TC rules and problems such as density, adaptability, interoperability and reactivity. In this thesis, we first define a meta-model to take into account the main AC models that are used within companies. We also propose a coherent and semi-automatic generation of TC policies based on existing AC to tackle the coherence problem. Moreover, several mechanisms have been proposed to tackle complexity, adaptability and interoperability issues. In order to validate the relevance of our solution, we have first conducted a survey among security experts and administrators. This survey has highlighted several information regarding the policies' size and density, the tiresomeness of having to define them and the interest for several functionalities that can cover the aforementioned problems. Finally, our solution has been tested on stochastically generated and real policies in order to take performances and reactivity under consideration. Results of these tests have validated that our solution covers the underlined problems
Analyse temporelle et sémantique des réseaux sociaux typés à partir du contenu de sites généré par des utilisateurs sur le Web by Zide Meng( )

1 edition published in 2016 in English and held by 1 WorldCat member library worldwide

We propose an approach to detect topics, overlapping communities of interest, expertise, trends andactivities in user-generated content sites and in particular in question-answering forums such asStackOverFlow. We first describe QASM (Question & Answer Social Media), a system based on socialnetwork analysis to manage the two main resources in question-answering sites: users and contents. Wealso introduce the QASM vocabulary used to formalize both the level of interest and the expertise ofusers on topics. We then propose an efficient approach to detect communities of interest. It relies onanother method to enrich questions with a more general tag when needed. We compared threedetection methods on a dataset extracted from the popular Q&A site StackOverflow. Our method basedon topic modeling and user membership assignment is shown to be much simpler and faster whilepreserving the quality of the detection. We then propose an additional method to automatically generatea label for a detected topic by analyzing the meaning and links of its bag of words. We conduct a userstudy to compare different algorithms to choose the label. Finally we extend our probabilistic graphicalmodel to jointly model topics, expertise, activities and trends. We performed experiments with realworlddata to confirm the effectiveness of our joint model, studying the users' behaviors and topicsdynamics
Approches d'apprentissage pour la classification à large échelle d'images de télédétection by Emmanuel Maggiori( )

1 edition published in 2017 in English and held by 1 WorldCat member library worldwide

The analysis of airborne and satellite images is one of the core subjects in remote sensing. In recent years, technological developments have facilitated the availability of large-scale sources of data, which cover significant extents of the earth's surface, often at impressive spatial resolutions. In addition to the evident computational complexity issues that arise, one of the current challenges is to handle the variability in the appearance of the objects across different geographic regions. For this, it is necessary to design classification methods that go beyond the analysis of individual pixel spectra, introducing higher-level contextual information in the process. In this thesis, we first propose a method to perform classification with shape priors, based on the optimization of a hierarchical subdivision data structure. We then delve into the use of the increasingly popular convolutional neural networks (CNNs) to learn deep hierarchical contextual features. We investigate CNNs from multiple angles, in order to address the different points required to adapt them to our problem. Among other subjects, we propose different solutions to output high-resolution classification maps and we study the acquisition of training data. We also created a dataset of aerial images over dissimilar locations, and assess the generalization capabilities of CNNs. Finally, we propose a technique to polygonize the output classification maps, so as to integrate them into operational geographic information systems, thus completing the typical processing pipeline observed in a wide number of applications. Throughout this thesis, we experiment on hyperspectral, atellite and aerial images, with scalability, generalization and applicability goals in mind
Traitement automatique des langues pour la recherche d'information musicale : analyse profonde de la structure et du contenu des paroles de chansons by Michael Fell( )

1 edition published in 2020 in English and held by 1 WorldCat member library worldwide

Les applications en Recherche d'Information Musicale et en musicologie computationnelle reposent traditionnellementsur des fonctionnalités extraites du contenu musical sous forme audio, mais ignorent la plupart du temps les paroles des chansons. Plus récemment, des améliorations dans des domaines tels que la recommandation de musique ont été apportées en tenant compte des métadonnées externes liées à la chanson. Dans cette thèse, nous soutenons que l'extraction des connaissances à partir des paroles des chansons est la prochaine étape pour améliorer l'expérience de l'utilisateur lors de l'interaction avec la musique. Pour extraire des connaissances de vastes quantités de paroles de chansons, nous montrons pour différents aspects textuels (leur structure, leur contenu et leur perception) comment les méthodes de Traitement Automatique des Langues peuvent être adaptées et appliquées avec succès aux paroles. Pour l'aspect structurel des paroles, nous en dérivons une description structurelle en introduisant un modèle qui segmente efficacement les paroles en leurs partiescaractéristiques (par exemple, intro, couplet, refrain). Puis, nous représentons le contenu des paroles en résumantles paroles d'une manière qui respecte la structure caractéristique des paroles. Enfin, sur la perception des paroles,nous étudions le problème de la détection de contenu explicite dans un texte de chanson. Cette tâche s'est avèree très difficile et nous montrons que la difficulté provienten partie de la nature subjective de la perception des paroles d'une manière ou d'une autre selon le contexte. De plus, nous abordons un autre problème de perception des paroles en présentant nos résultats préliminaires sur la reconnaissance des émotions. L'un des résultats de cette thèse a été de créer un corpus annoté, le WASABI Song Corpus, un ensemble de données de deux millions de chansons avec des annotations de paroles TAL à différents niveaux
Vers des modèles synergiques de l'estimation du mouvement en vision biologique et artificielle by Naga Venkata Kartheek Medathati( )

1 edition published in 2016 in English and held by 1 WorldCat member library worldwide

In this thesis, we studied the problem of motion estimation in mammals and propose that scaling up models rooted in biology for real world applications can give us fresh insights into the biological vision. Using a classic model that describes the activity of directionally-selective neurons in V1 and MT areas of macaque brain, we proposed a feedforward V1-MT architecture for motion estimation and benchmarked it on computer vision datasets (first publicly available evaluation for this kind of models), revealing interesting shortcomings such as lack of selectivity at motion boundaries and lack of spatial association of the flow field. To address these, we proposed two extensions, a form modulated pooling strategy to minimize errors at texture boundaries and a regression based decoding scheme. These extensions improved estimation accuracy but also reemphasized the debate about the role of different cell types (characterized by their tuning curves) in encoding motion, for example relative role of pattern cells versus component cells. To understand this, we used a phenomenological neural fields model representative of a population of directionally tuned MT cells to check whether different tuning behaviors could be reproduced by a recurrently interacting population or if we need different types of cells explicitly. Our results indicated that a variety of tuning behavior can be reproduced by a minimal network, explaining dynamical changes in the tuning with change of stimuli leading us to question the high inhibition regimes typically considered by models in the literature
Détection et analyse d'événement dans les messages courts by Amosse Edouard( )

1 edition published in 2017 in English and held by 1 WorldCat member library worldwide

Les réseaux sociaux ont transformé le Web d'un mode lecture, où les utilisateurs pouvaient seulement consommer les informations, à un mode interactif leur permettant de les créer, partager et commenter. Un défi majeur du traitement d'information dans les médias sociaux est lié à la taille réduite des contenus, leur nature informelle et le manque d'informations contextuelles. D'un autre côté, le web contient des bases de connaissances structurées à partir de concepts d'ontologies, utilisables pour enrichir ces contenus. Cette thèse explore le potentiel d'utiliser les bases de connaissances du Web de données, afin de détecter, classifier et suivre des événements dans les médias sociaux, particulièrement Twitter. On a abordé 3 questions de recherche : i) Comment extraire et classifier les messages qui rapportent des événements ? ii) Comment identifier des événements précis ? iii) Étant donné un événement, comment construire un fil d'actualité représentant les différents sous-événements ? Les travaux de la thèse ont contribué à élaborer des méthodes pour la généralisation des entités nommées par des concepts d'ontologies pour mitiger le sur-apprentissage dans les modèles supervisés ; une adaptation de la théorie des graphes pour modéliser les relations entre les entités et les autres termes et ainsi caractériser des événements pertinents ; l'utilisation des ontologies de domaines et les bases de connaissances dédiées, pour modéliser les relations entre les caractéristiques et les acteurs des événements. Nous démontrons que l'enrichissement sémantique des entités par des informations du Web de données améliore la performance des modèles d'apprentissages supervisés et non supervisés
Description de contenu vidéo : mouvements et élasticité temporelle by Katy Blanc( )

1 edition published in 2018 in French and held by 1 WorldCat member library worldwide

Video recognition gain in performance during the last years, especially due to the improvement in the deep learning performances on images. However the jump in recognition rate on images does not directly impact the recognition rate on videos. This limitation is certainly due to this added dimension, the time, on which a robust description is still hard to extract. The recurrent neural networks introduce temporality but they have a limited memory. State of the art methods for video description usually handle time as a spatial dimension and the combination of video description methods reach the current best accuracies. However the temporal dimension has its own elasticity, different from the spatial dimensions. Indeed, the temporal dimension of a video can be locally deformed: a partial dilatation produces a visual slow down during the video, without changing the understanding, in contrast with a spatial dilatation on an image which will modify the proportions of the shown objects. We can thus expect to improve the video content classification by creating an invariant description to these speed changes. This thesis focus on the question of a robust video description considering the elasticity of the temporal dimension under three different angles. First, we have locally and explicitly described the motion content. Singularities are detected in the optical flow, then tracked along the time axis and organized in chain to describe video part. We have used this description on sport content. Then we have extracted global and implicit description thanks to tensor decompositions. Tensor enables to consider a video as a multi-dimensional data table. The extracted description are evaluated in a classification task. Finally, we have studied speed normalization method thanks to Dynamical Time Warping methods on series. We have showed that this normalization improve the classification rates
Gestion du compromis vitesse-précision dans les systèmes de détection de piétons basés sur apprentissage profond by Ujjwal Ujjwal( )

1 edition published in 2019 in English and held by 1 WorldCat member library worldwide

The main objective of this thesis is to improve the detection performance of deep learning based pedestrian detection systems without sacrificing detection speed. Detection speed and accuracy are traditionally known to be at trade-off with one another. Thus, this thesis aims to handle this trade-off in a way that amounts to faster and better pedestrian detection. To achieve this, we first conduct a systematic quantitative analysis of various deep learning techniques with respect to pedestrian detection. This analysis allows us to identify the optimal configuration of various deep learning components of a pedestrian detection pipeline. We then consider the important question of convolutional layer selection for pedestrian detection and propose a pedestrian detection system called Multiple-RPN, which utilizes multiple convolutional layers simultaneously. We propose Multiple-RPN in two configurations -- early-fused and late-fused; and go on to demonstrate that early fusion is a better approach than late fusion for detection across scales and occlusion levels of pedestrians. This work furthermore, provides a quantitative demonstration of the selectivity of various convolutional layers to pedestrian scale and occlusion levels. We next, integrate the early fusion approach with that of pseudo-semantic segmentation to reduce the number of processing operations. In this approach, pseudo-semantic segmentation is shown to reduce false positives and false negatives. This coupled with reduced number of processing operations results in improved detection performance and speed (~20 fps) simultaneously; performing at state-of-art level on caltechreasonable (3.79% miss-rate) and citypersons (7.19% miss-rate) datasets. The final contribution in this thesis is that of an anchor classification layer, which further reduces the number of processing operations for detection. The result is doubling of detection speed (~40 fps) with a minimal loss in detection performance (3.99% and 8.12% miss-rate in caltech-reasonable and citypersons datasets respectively) which is still at the state-of-art standard
Video event detection and visual data pro cessing for multimedia applications by Daniel Szolgay( )

1 edition published in 2011 in English and held by 1 WorldCat member library worldwide

Cette thèse (i) décrit une procédure automatique pour estimer la condition d'arrêt des méthodes de déconvolution itératives basées sur un critère d'orthogonalité du signal estimé et de son gradient à une itération donnée; (ii) présente une méthode qui décompose l'image en une partie géométrique (ou "cartoon") et une partie "texture" en utilisation une estimation de paramètre et une condition d'arrêt basées sur la diffusion anisotropique avec orthogonalité, en utilisant le fait que ces deux composantes. "cartoon" et "texture", doivent être indépendantes; (iii) décrit une méthode pour extraire d'une séquence vidéo obtenue à partir de caméra portable les objets de premier plan en mouvement. Cette méthode augmente la compensation de mouvement de la caméra par une nouvelle estimation basée noyau de la fonction de probabilité de densité des pixels d'arrière-plan. Les méthodes présentées ont été testées et comparées aux algorithmes de l'état de l'art
Vers une reconnaissance des activités humaines non supervisées et des gestes dans les vidéos by Farhood Negin( )

1 edition published in 2018 in English and held by 1 WorldCat member library worldwide

The main goal of this thesis is to propose a complete framework for automatic discovery, modeling and recognition of human activities in videos. In order to model and recognize activities in long-term videos, we propose a framework that combines global and local perceptual information from the scene and accordingly constructs hierarchical activity models. In the first variation of the framework, a supervised classifier based on Fisher vector is trained and the predicted semantic labels are embedded in the constructed hierarchical models. In the second variation, to have a completely unsupervised framework, rather than embedding the semantic labels, the trained visual codebooks are stored in the models. Finally, we evaluate the proposed frameworks on two realistic Activities of Daily Living datasets recorded from patients in a hospital environment. Furthermore, to model fine motions of human body, we propose four different gesture recognition frameworks where each framework accepts one or combination of different data modalities as input. We evaluate the developed frameworks in the context of medical diagnostic test namely Praxis. Praxis test is a gesture-based diagnostic test, which has been accepted as a diagnostically indicative of cortical pathologies such as Alzheimer's disease. We suggest a new challenge in gesture recognition, which is to obtain an objective opinion about correct and incorrect performances of very similar gestures. The experiments show effectiveness of our deep learning based approach in gesture recognition and performance assessment tasks
 
moreShow More Titles
fewerShow Fewer Titles
Audience Level
0
Audience Level
1
  Kids General Special  
Audience level: 0.51 (from 0.49 for Visual ind ... to 0.97 for Visual ind ...)

Visual indexing and retrieval
Covers
Languages
English (31)

French (8)