Multi-scale Feature Learning Dynamics: Insights for Double Descent

Introduction

This repository contains the official implementation of:

Multi-scale Feature Learning Dynamics: Insights for Double Descent Anonymous authors

Reproducibility

To enshure reproducibility, we publish the code, saved logs, and expeted results of every experiment.

We calim that all figures presented in manuscript can be reproduced using the following requirements:

Python 3.7.10
PyTorch 1.4.0
torchvision 0.5.0
tqdm
matplotlib 3.4.3

ResNet experiments on CIFAR-10

ResNet experiments on CIFAR-10 took 12000 GPU hours on Nvidia V100. The code to manage experiments using slurm resourse management tool is provided in the README available in the ResNet_experiments folder.

To reproduce each figure of the manuscript

python fig1.py: The generalization error as the training time proceeds. (top): The case where only the fast-learning feature or slow-learning feature are trained. (bottom): The case where both features are trained with \kappa=100. :

python fig2_ab.py: Heat-map of empirical generalization error (0-1 classification error) for the ResNet-18 trained on CIFAR-10 with $15 % label noise. X-axis denotes the regularization strength and Y-axis represents the training time. :

python fig2_cd.py: The same plot but with analytical results with the teacher-student. We observe a qualitative comparison between the ResNet-18 results and our analytical results. :

python fig3.py: Left: Phase diagram of the generalization error as a function of R(t) and Q(t). The trajectories describe the evolution of R(t) and Q(t) as training proceeds. Each trajectory correspond to a different $\kappa$, the condition number of the modulation matrix where it describes the ratio of the rates at which two sets of features are learned. Right: The corresponding generalization curves for different plotted over the training time axis.

Extra experiments

Match between theory and experiments

Here, we validate our analytical results by comparing our analytical results (Eqs. 12, 14 substituted into Eq. 6) with empirical gradient descent:

Emperical gradient descent
Analytical results - the general exact case (Eq. 9 substituted into Eq. 6):
Analytical results - the fast-slow approximate case (Eqs. 12, 14 substituted into Eq. 6):

python extra_experiments/emp_vs_analytic.py

Previous experiments with different setups

We also provide further experiments where we vary the following variables:

n: number of training examples
d: number of total dimensions
p: number of fast learning dimensions

Four variants of fig1

Four variants of fig3

Interactive notebook

Finally, to try different setups, please check out the following anonymous colab notebook: Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-scale Feature Learning Dynamics: Insights for Double Descent

Introduction

Reproducibility

ResNet experiments on CIFAR-10

To reproduce each figure of the manuscript

Extra experiments

Match between theory and experiments

Previous experiments with different setups

Interactive notebook

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ResNet_experiments		ResNet_experiments
expected_results		expected_results
extra_experiments		extra_experiments
README.md		README.md
fig1.py		fig1.py
fig2_ab.py		fig2_ab.py
fig2_cd.py		fig2_cd.py
fig3.py		fig3.py

Folders and files

Latest commit

History

Repository files navigation

Multi-scale Feature Learning Dynamics: Insights for Double Descent

Introduction

Reproducibility

ResNet experiments on CIFAR-10

To reproduce each figure of the manuscript

Extra experiments

Match between theory and experiments

Previous experiments with different setups

Interactive notebook

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages