You are currently on a failover version of the Materials Cloud Archive hosted at CINECA, Italy.
Click here to access the main Materials Cloud Archive.
Note: If the link above redirects you to this page, it means that the Archive is currently offline due to maintenance. We will be back online as soon as possible.
This version is read-only: you can view published records and download files, but you cannot create new records or make changes to existing ones.

×

Recommended by

Indexed by

Analysis of bootstrap and subsampling in high-dimensional regularized regression (code)

Lucas Clarte1*, Adrien Vandenbroucque1,2, Guillaume Dalle3,1,2, Bruno Loureiro4, Florent Krzakala2, Lenka Zdeborova1

1 École Polytechnique Fédérale de Lausanne (EPFL), Statistical Physics of Computation laboratory, CH-1015 Lausanne, Switzerland

2 École Polytechnique Fédérale de Lausanne (EPFL), Information, Learning and Physics laboratory, CH-1015 Lausanne, Switzerland

3 École Polytechnique Fédérale de Lausanne (EPFL), Information and Network Dynamics laboratory, CH-1015 Lausanne, Switzerland

4 Département d’Informatique, École Normale Supérieure - PSL & CNRS, Paris, France

* Corresponding authors emails: lucas.clarte@epfl.ch
DOI10.24435/materialscloud:az-j9 [version v1]

Publication date: Jan 30, 2025

How to cite this record

Lucas Clarte, Adrien Vandenbroucque, Guillaume Dalle, Bruno Loureiro, Florent Krzakala, Lenka Zdeborova, Analysis of bootstrap and subsampling in high-dimensional regularized regression (code), Materials Cloud Archive 2025.25 (2025), https://doi.org/10.24435/materialscloud:az-j9

Description

We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, taking the limit where the number of samples n and dimension d of the covariates grow at a comparable fixed rate α = n/d. Our findings are three-fold: i) resampling methods are fraught with problems in high dimensions and exhibit the double-descent-like behavior typical of these situations; ii) only when α is large enough do they provide consistent and reliable error estimations (we give convergence rates); iii) in the over-parametrized regime α < 1 relevant to modern machine learning practice, their predictions are not consistent, even with optimal regularization. This record provides the code to reproduce the numerical experiments of the related paper "Analysis of bootstrap and subsampling in high-dimensional regularized regression".

Materials Cloud sections using this data

No Explore or Discover sections associated with this archive record.

Files

File name Size Description
BootstrapAsymptotics-main.zip
MD5md5:9ecf4b0632902209f673b53919ac1512
957.0 KiB Compressed files contained in the repository https://github.com/spoc-group/BootstrapAsymptotics
README.txt
MD5md5:98c73ff79efc66b38ed648aad8eef65e
500 Bytes README file describing the structure of the code

License

Files and data are licensed under the terms of the following license: Creative Commons Attribution 4.0 International.
Metadata, except for email addresses, are licensed under the Creative Commons Attribution Share-Alike 4.0 International license.

Keywords

MARVEL/P2 uncertainty quantification neural networks numerical simulation

Version history:

2025.25 (version v1) [This version] Jan 30, 2025 DOI10.24435/materialscloud:az-j9