This is the repository for the ETH Deep Learning project, based on the original paper "Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning" from here. We propose Utility-based Stochastic Gradient Descent with Adaptive Noise injection as an improvement to the Utility-based Perturbed Gradient Descent (UPGD) method. Experiments show that our model achieves higher average accuracy (55.38% vs. 55.29%) and average plasticity (45.52% vs. 41.86%).
Details of our approach and results can be found in our Report
Here we describe how to reproduce the results.
git clone --recursive [email protected]:yumikim381/upgd-dl-project.git
python3.7 -m venv .upgd
source .upgd/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install HesScale/.
pip install .
We have used Algorithm 1 of the original UPGD as our baseline. This code can be run as follows:
python3 core/run/run_stats.py \
--task label_permuted_cifar10_stats \
--learner baseline \
--seed 19 \
--lr 0.01 \
--beta_utility 0.999 \
--sigma 0.001 \
--weight_decay 0.0 \
--network convolutional_network_relu_with_hooks \
--n_samples 1000000
Use the notebook notebooks/visualize_kernels.ipynb
to run through the visualizations. All have individual cells, and should run with the same set of requirements as the rest of the code, with the potential exception of ipython and jupyterlab.
To conduct the accuracy and catastrophic forgetting experiment using the CIFAR-10 dataset, run the following:
python3 core/run/run_stats.py \
--task label_permuted_cifar10_stats \
--learner usgd \
--seed 19 \
--lr 0.01 \
--beta_utility 0.999 \
--sigma 0.001 \
--weight_decay 0.0 \
--network convolutional_network_relu_with_hooks \
--n_samples 1000000
To conduct the loss of plasticity experiment based on the MNIST dataset, run:
python3 core/run/run_stats.py \
--task input_permuted_mnist_stats \
--learner usgd \
--seed 19 \
--lr 0.01 \
--beta_utility 0.999 \
--sigma 0.001 \
--weight_decay 0.0 \
--network conv_mnist \
--n_samples 1000000
usgd
can be replaced with the following options to run other variations we have tried out:
- Layer-wise Noise Scaling
weight_norm
for scaling by the norm of weightsgrad_norm
for scaling by the norm of gradientsratio_norm
for scaling by the ratio of the gradient norm to the weight norm
- Kernel Utility
entire_kernel
for entire kernel evaluationcolumn_kernel
for column-wise kernel evaluation
Use the notebook notebooks/get_results.ipynb
to get the evaluations of the 2 experiments. The metrics will be printed out in a table format and graphical visualizations of the performance are provided.