Dynamical Variational Autoencoders: A Comprehensive Review

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space that is learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In the recent years, a series of papers have presented different extensions of the VAE to sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and/or corresponding latent vectors, relying on recurrent neural networks or state space models. In this paper we perform an extensive literature review of these models. Importantly, we introduce and discuss a general class of models called Dynamical Variational Autoencoders (DVAEs) that encompass a large subset of these temporal VAE extensions. Then we present in details seven different instances of DVAE that were recently proposed in the literature, with an effort to homogenize the notations and presentation lines, as well as to relate those models with existing classical temporal models (that are also presented for the sake of completeness). We reimplemented those seven DVAE models and we present the results of an experimental benchmark that we conducted on the speech analysis-resynthesis task (the PyTorch code will be made publicly available). An extensive discussion is presented at the end of the paper, aiming to comment on important issues concerning the DVAE class of models and to describe future research guidelines.

Mots clés

Deep Learning Variational inference Dimensionality reduction Graphical models Dynamics Learning and statistical methods Nonlinear signal processing Speech/audio/image/video compression Latent variable models Time series analysis

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Traitement du signal et de l'image [eess.SP] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

Girin_at_al_DVAE_review_2021_arXiv_version.pdf (1.28 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02926215

Soumis le : mardi 18 janvier 2022-17:09:19

Dernière modification le : lundi 3 avril 2023-14:50:05

Dates et versions

hal-02926215 , version 1 (18-01-2022)

hal-02926215 , version 2 (05-07-2022)

Identifiants

HAL Id : hal-02926215 , version 1
ARXIV : 2008.12595
DOI : 10.1561/2200000089

Citer

Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, et al.. Dynamical Variational Autoencoders: A Comprehensive Review. Foundations and Trends in Machine Learning, 2021, 15 (1-2), pp.1-175. ⟨10.1561/2200000089⟩. ⟨hal-02926215v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

LJK_GI_PERCEPTION

1042 Consultations

1270 Téléchargements