Skip to Main content Skip to Navigation

A study of some trade-offs in statistical learning : online learning, generative models and fairness

Abstract : Machine learning algorithms are celebrated for their impressive performance on many tasksthat we thought were dedicated to human minds, from handwritten digits recognition (LeCunet al. 1990) to cancer prognosis (Kourou et al. 2015). Nevertheless, as machine learning becomes more and more ubiquitous in our daily lives, there is a growing need for precisely understanding their behaviours and their limits.Statistical learning theory is the branch of machine learning which aims at providing a powerful modelling formalism for inference problems as well as a better understanding of the statistical properties of learning algorithms.Importantly, statistical learning theory allows one to (i) get a better understanding of the cases in which an algorithm performs well (ii) quantify trade-offs inherent to learning for better-informed algorithmic choices (iii) provide insights to develop new algorithms which will eventually outperform existing ones or tackle new tasks. Relying on the statistical learning framework, this thesis presents contributions related to three different learning problems: online learning, learning generative models and, finally, fair learning.In the online learning setup -- in which the sample size is not known in advance -- we provide general anytime deviation bounds (or confidence intervals) whose width has the rate given in the Law of Iterated Logarithm for a general class of convex M-estimators -- comprising the mean, the median, quantiles, Huber’s M-estimators.Regarding generative models, we propose a convenient framework for studying adversarial generative models (Goodfellow et al. 2014) from a statistical perspective to assess the impact of (eventual) low intrinsic dimensionality of the data on the error of the generative model. In our framework, we establish non-asymptotic risk bounds for the Empirical Risk Minimizer (ERM).Finally, our work on fair learning consists in a broad study of the Demographic Parity (DP) constraint, a popular constraint in the fair learning literature. DP essentially constrains predictors to treat groups defined by a sensitive attribute (e.g., gender or ethnicity) to be “treated the same”. In particular, we propose a statistical minimax framework to precisely quantify the cost in risk of introducing this constraint in the regression setting.
Complete list of metadata
Contributor : Abes Star :  Contact
Submitted on : Friday, January 14, 2022 - 12:20:10 PM
Last modification on : Friday, January 14, 2022 - 12:20:10 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03435618, version 1



Nicolas Schreuder. A study of some trade-offs in statistical learning : online learning, generative models and fairness. Statistics [math.ST]. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAG004⟩. ⟨tel-03435618⟩



Les métriques sont temporairement indisponibles