Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions tutorials/W2D4_Macrolearning/W2D4_Intro.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,17 @@
"feedback_prefix = \"W2D4_Intro\""
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Macrolearning\n",
"\n",
"Over the last two weeks in this NeuroAI course, we've focused on numerous aspects of learning mechanisms, inductive biases and how generalization is implemented as a fundamental property of successful learning systems. Back in Day 2 we looked at the topic of \"Comparing Tasks\" and Leila Wehbe introduced the idea of learning at multiple temporal scales. This idea was explored fairly lightly back then, but today we're going to go much deeper into this idea of learning of a macro-level of time and in a Reinforcement Learning framework. What does it mean about learning across the scale of biological evolution and at the species level? That's what today is all about. We hope you enjoy this fascinating topic!"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand All @@ -57,7 +68,7 @@
"source": [
"## Prerequisites\n",
"\n",
"For this day, it would be beneficial to have prior experience working with the `pytorch` modeling package, as the last tutorials are going to be concentrated on defining architecture and training rules using this framework. For Tutorials 4 & 5, you might find yourself more comfortable if you are familiar with the reinforcement learning paradigm and with the Actor-Critic model, in particular. Actually, Tutorial 4 will elaborate on the agent already introduced previously in the last tutorial of Day 2, completing the discussion of meta-learning."
"For this day, it would be beneficial to have prior experience working with the `pytorch` modeling package, as the last tutorials are going to be concentrated on defining architecture and training rules using this framework. For Tutorials 4 & 5, you might find yourself more comfortable if you are familiar with the reinforcement learning paradigm and with the Actor-Critic model, in particular. Actually, Tutorial 4 will elaborate on the agent already introduced previously in the last tutorial of Day 2, completing the discussion of meta-learning. It might also be useful to revisit the recording in Tutorial 3 of Day 2, specifically related to the idea of Partially-Observable Markov Decision Processes (POMDPs)."
]
},
{
Expand Down Expand Up @@ -206,7 +217,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
"version": "3.9.22"
}
},
"nbformat": 4,
Expand Down
43 changes: 28 additions & 15 deletions tutorials/W2D4_Macrolearning/W2D4_Tutorial1.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@
"\n",
"__Content creators:__ Hlib Solodzhuk, Ximeng Mao, Grace Lindsay\n",
"\n",
"__Content reviewers:__ Aakash Agrawal, Alish Dipani, Hossein Rezaei, Yousef Ghanbari, Mostafa Abdollahi, Hlib Solodzhuk, Ximeng Mao, Samuele Bolotta, Grace Lindsay\n",
"__Content reviewers:__ Aakash Agrawal, Alish Dipani, Hossein Rezaei, Yousef Ghanbari, Mostafa Abdollahi, Hlib Solodzhuk, Ximeng Mao, Samuele Bolotta, Grace Lindsay, Alex Murphy\n",
"\n",
"__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk\n"
"__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Alex Murphy\n"
]
},
{
Expand All @@ -44,13 +44,13 @@
"\n",
"In this tutorial, we will explore the problems that arise from *distribution shifts*. Distribution shifts occur when the testing data distribution deviates from the training data distribution; that is, when a model is evaluated on data that somehow differs from what it was trained on.\n",
"\n",
"There are many ways that testing data can differ from training data. Two broad categories of distribution shifts are: **covariate shift** and **concept shift**.\n",
"There are many ways that testing data can differ from training data. Two broad categories of distribution shifts are: **covariate shift** and **concept shift**. While we expect most of you to be familiar with the term *concept*, the term *covariate* is used in different ways in different fields and we want to clarify the specific usage we will be using in this tutorial. Unlike in the world of statistics, where a covariate might be a confounding variable, or specifically a continuous predictor feature, when talking about distribution shifts in machine learning, the term *covariate* is synonymous with any input **feature** (regardless of its causal status towards predicting the desired output of the model).\n",
"\n",
"In covariate shift, the distribution of input features changes. For example, consider a dog/cat classification task where the model was trained to differentiate between real photos of pets while the testing dataset consists entirely of cartoon characters.\n",
"In covariate shift, the distribution of input features, $P(X)$ changes. For example, consider a dog/cat classification task where the model was trained to differentiate these classes using real photos of pets, while the testing dataset represents the same dog/cat classification task, but using images of cartoon characters exclusively.\n",
"\n",
"Concept shift, as its name suggests, involves a conceptual change in the relationship between features and the desired output. For example, a recommendation system may learn a user's preferences, but then those preferences change.\n",
"Concept shift, as its name suggests, involves a conceptual change in the relationship between features and the desired output, $P(Y|X)$. For example, a recommendation system may learn a user's preferences, but then those preferences change. It's the mapping from features to outputs that is shifting, while the distribution of the inputs, $P(X)$ remains the same.\n",
"\n",
"We will explore both types of shifts using a simple function that represents the relationship between the day of the year and the price of fruits in the local market!"
"We will explore both types of shifts using a simple function that represents the relationship between the day of the year and the price of fruits in a local market!"
]
},
{
Expand Down Expand Up @@ -325,7 +325,7 @@
"\n",
"# Section 1: Covariate shift\n",
"\n",
"In this section, we are going to discuss the case of covariate shift in distribution - when the distribution of features (usually denoted by $\\mathbf{x}$) differs in the training and testing data."
"In this section, we are going to discuss covariate shifts (a major type of distribution shift). Covariate shift arises when the distribution of features, $P(X)$, differs between the training and testing data. This could be when the style of input is different (e.g. real photos vs cartoon illustrations). Another example is when looking at house price predictions. If you train on data from rural areas and test on data from urban areas, the distributions of inputs are not consistent (houses might be small and high-priced in urban areas that are in an excellent location, but no such examples of small houses being high-priced will exist in the rural data)."
]
},
{
Expand All @@ -342,9 +342,9 @@
"\n",
"$$f(x) = A x^{2} + B sin(\\pi x + \\phi) + C$$\n",
"\n",
"This equation suggests quadratic annual behavior (with summer months being at the bottom of the parabola) as well as bi-weekly seasonality introduced by the $sin(\\pi x)$ term (with top values being the days where there is supply of fresh fruits to the market). Variables $A, B, \\phi \\:\\: \\text{and} \\:\\: C$ allow us to tune the day-price relation in different scenarios (for example, we will observe the role of $\\phi$ in the second section of the tutorial). For this particular case, let us set $A = 0.005$, $B = 0.1$, $\\phi = 0$ and $C = 1$.\n",
"This equation suggests quadratic annual behavior (with summer months being at the bottom of the parabola) as well as bi-weekly seasonality introduced by the $sin(\\pi x)$ term (with top values being the days where there is supply of fresh fruits to the market). Variables $A, B, \\phi \\:\\: \\text{and} \\:\\: C$ allow us to tune the day-price relation in different scenarios. We will observe the role of $\\phi$ in the second section of the tutorial. For this particular case, let us set $A = 0.005$, $B = 0.1$, $\\phi = 0$ and $C = 1$.\n",
"\n",
"At first, let's take a look at the data - we will plot it.\n"
"Let's first take a look at the data by plotting it so we can orient ourselves to the input data used in this task.\n"
]
},
{
Expand Down Expand Up @@ -752,9 +752,9 @@
"\n",
"# Section 2: Concept shift\n",
"\n",
"Estimated time to reach this point from the start of the tutorial: 15 minutes\n",
"*Estimated time to reach this point from the start of the tutorial: 15 minutes*\n",
"\n",
"In this section, we are going to explore another case of distribution shift, which is different in nature from covariate shift: concept shift."
"In this section, we are going to explore another case of distribution shift, which is different in nature from covariate shift: concept shift. This is when the distribution of the inputs, $P(X)$ remains stable, but the mapping from features to predictions, $P(Y|X)$, differs between training and testing data distributions."
]
},
{
Expand All @@ -773,7 +773,7 @@
"<details>\n",
"<summary>Answer</summary>\n",
"<br>\n",
"Yes, indeed, it involves a sinusoidal phase shift — we only need to change the $\\phi$ value.\n",
"Yes, indeed, it involves a sinusoidal phase shift — we only need to change the phi value.\n",
"</details>\n",
"\n",
"Let's take a look at how well the model generalizes to this unexpected change as well."
Expand Down Expand Up @@ -849,7 +849,7 @@
"execution": {}
},
"source": [
"Indeed, the model's predictions are capturing the original phase, not the phase-shifted function. Well, it's somewhat expected: we fixed values of features (in this case, week number), trained at first on one set of output values (prices), and then changed the outputs to measure the model's performance. It's obvious that the model will perform badly. Still, it's important to notice the effect of concept shift and this translation between conceptual effect and its impact on modeling. "
"The model's predictions are capturing the original phase, not the phase-shifted function. Well, it's somewhat expected: we fixed values of features (in this case, week number), trained at first on one set of output values (prices), and then changed the outputs to measure the model's performance. It's obvious that the model will perform badly. Still, it's important to notice the effect of concept shift and this translation between conceptual effect and its impact on modeling."
]
},
{
Expand Down Expand Up @@ -937,7 +937,20 @@
"Here's what we learned:\n",
"\n",
"1. Covariate and concept shifts are two different types of data distribution shifts.\n",
"2. Distribution shifts negatively impact model performance.\n",
"2. Distribution shifts negatively impact model performance."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# The Big Picture\n",
"\n",
"Distribution shifts are a huge issue in modern ML systems. Awareness of the fundamental idea behind how these shifts can happen is increasingly important, the more that these systems take on roles that impact systems that we interact with in our daily lives. During COVID-19, product replenishment systems failed spectacularly because there was an underlying shift (panic buying of certain items) that the model did not expect and this caused a huge problem for systems that relied on statistical predictions in the company pipeline.\n",
"\n",
"In NeuroAI, the distribution shifts can happen in numerous places. For example, training a model on sets of neurons that belong to different brain areas or perhaps the same distribution of neurons that differ due to a confounding third factor, that renders the training and test distribution of features to be different. Awareness of potential distribution shifts is incredibly important and should be something systems are continuosly monitoring. NeuroAI currently lags behind in its adoption of evaluations that monitor these kinds of issues. Our goal is to bring this attention more to the forefront so that in your careers as NeuroAI practioners, you are aware of the necessary factors that can affect the models you build.\n",
"\n",
"In the next tutorials, we are going to address the question of generalization—what are the techniques and methods to deal with poor generalization performance due to distribution shifts."
]
Expand Down Expand Up @@ -971,7 +984,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
"version": "3.9.22"
}
},
"nbformat": 4,
Expand Down
Loading