diff --git a/tutorials/README.md b/tutorials/README.md
index 69e508c5e..e5bcba111 100644
--- a/tutorials/README.md
+++ b/tutorials/README.md
@@ -179,6 +179,7 @@ Slides: [Intro](https://mfr.ca-1.osf.io/render?url=https://osf.io/v7ber/?direct%
 | Intro | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb?flush_cache=true) |
 | Tutorial 1 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial1.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial1.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial1.ipynb?flush_cache=true) |
 | Tutorial 2 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial2.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial2.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial2.ipynb?flush_cache=true) |
+| Tutorial 3 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb?flush_cache=true) |
 | Outro | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Outro.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/W2D5_Outro.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Outro.ipynb?flush_cache=true) |
 
 
diff --git a/tutorials/W2D5_Mysteries/README.md b/tutorials/W2D5_Mysteries/README.md
index 899affd8f..0f04a63ae 100644
--- a/tutorials/W2D5_Mysteries/README.md
+++ b/tutorials/W2D5_Mysteries/README.md
@@ -7,6 +7,7 @@
 | Intro | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb?flush_cache=true) |
 | Tutorial 1 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial1.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial1.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial1.ipynb?flush_cache=true) |
 | Tutorial 2 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial2.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial2.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial2.ipynb?flush_cache=true) |
+| Tutorial 3 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial3.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial3.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial3.ipynb?flush_cache=true) |
 | Outro | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Outro.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/W2D5_Outro.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Outro.ipynb?flush_cache=true) |
 
 
@@ -17,5 +18,6 @@
 | Intro | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb?flush_cache=true) |
 | Tutorial 1 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial1.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial1.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial1.ipynb?flush_cache=true) |
 | Tutorial 2 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial2.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial2.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial2.ipynb?flush_cache=true) |
+| Tutorial 3 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb?flush_cache=true) |
 | Outro | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Outro.ipynb) | [![Open In kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/W2D5_Outro.ipynb) | [![View the notebook](https://img.shields.io/badge/render-nbviewer-orange.svg)](https://nbviewer.jupyter.org/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/W2D5_Outro.ipynb?flush_cache=true) |
 
diff --git a/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb b/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb
index bc018d3c2..e7bea670c 100644
--- a/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb
+++ b/tutorials/W2D5_Mysteries/W2D5_Intro.ipynb
@@ -59,6 +59,19 @@
     "feedback_prefix = \"W2D5_Intro\""
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "# Mysteries \n",
+    "\n",
+    "Welcome to the final day of the NeuroAI course! You've covered a wide range of topics and we hope you have enjoyed the content we've put together and that you've put your mind to work in absorbing some of the low-level as well as the high-level details of some of this - at time - tricky and mathematically detailed content. As you can tell from the title of this final day, we're switching to a different type of educational content. We're leaving you with some of the open mysteries in the field and talking you through some of the on-going work aimed at finding solutions. \n",
+    "\n",
+    "We hope with the tools we've equipped you with, you might be inspired by some of the active mysteries and perhaps your name(s) will be on papers in the future that aim to provide some solid work that goes further to help understand the underlying mechanisms behind some of these super interesting ideas."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -214,7 +227,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.22"
   }
  },
  "nbformat": 4,
diff --git a/tutorials/W2D5_Mysteries/W2D5_Tutorial1.ipynb b/tutorials/W2D5_Mysteries/W2D5_Tutorial1.ipynb
index 304c3f063..af88d951e 100644
--- a/tutorials/W2D5_Mysteries/W2D5_Tutorial1.ipynb
+++ b/tutorials/W2D5_Mysteries/W2D5_Tutorial1.ipynb
@@ -25,9 +25,9 @@
     "\n",
     "__Content creators:__ Steve Fleming, Guillaume Dumas, Samuele Bolotta, Juan David Vargas, Hakwan Lau, Anil Seth, Megan Peters\n",
     "\n",
-    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Patrick Mineault\n",
+    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Patrick Mineault, Alex Murphy\n",
     "\n",
-    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Patrick Mineault\n"
+    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Patrick Mineault, Alex Murphy\n"
    ]
   },
   {
@@ -50,7 +50,9 @@
     "\n",
     "2. Explore core frameworks for analyzing consciousness, including diagnostic criteria, and will compare objective probabilities with subjective credences.\n",
     "\n",
-    "3. Explore reductionist theories of consciousness, such as Global Workspace Theory (GWT), theories of metacognition, and Higher-Order Thought (HOT) theories.\n"
+    "3. Explore reductionist theories of consciousness, such as Global Workspace Theory (GWT), theories of metacognition, and Higher-Order Thought (HOT) theories.\n",
+    "\n",
+    "The topic of consciousness and what it means to be *conscious* is a long-standing open question in neuroscience and recently has drawn a lot of attention in machine learning in the context of large language models and foundation models. People have claimed that these models exhibits sparks of consciousness and a strong debate in the community continues to rage on. It's therefore likely a big issue that will continue to gain a lot of traction in the space of NeuroAI and we hope you can start to build some familiarity with the tools use to quantify and study this fascinating topic. \n"
    ]
   },
   {
@@ -450,6 +452,7 @@
     "\n",
     "        # Close the figure to free up memory\n",
     "        plt.close(fig)\n",
+    "\n",
     "# Function to configure the training environment and load the models\n",
     "def get_test_patterns(factor):\n",
     "    \"\"\"\n",
@@ -532,7 +535,7 @@
     "            discrimination_performances.append(discrimination_performance)\n",
     "\n",
     "\n",
-    "            chance_level = torch.Tensor( generate_chance_level((200*factor,100)))\n",
+    "            chance_level = torch.Tensor( generate_chance_level((200*factor,100))).to(device)\n",
     "            discrimination_random= round((chance_level[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean().item(), 2)\n",
     "            print(\"chance level\" , discrimination_random)\n",
     "\n",
@@ -1024,7 +1027,9 @@
     "              \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n",
     "              \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n",
     "\n",
-    "    return device"
+    "    return device\n",
+    "\n",
+    "device = set_device()"
    ]
   },
   {
@@ -1275,9 +1280,24 @@
    "source": [
     "In this section, we are exploring an important concept in machine learning: the idea that the complexity we observe in the physical world often arises from simpler, independently functioning parts. Think of the world as being made up of different modules or units that usually operate on their own but sometimes interact with each other. This is similar to how different apps on your phone work independently but can share information when needed.\n",
     "\n",
+    "---\n",
+    "\n",
+    "### Modularity Recap\n",
+    "Remember in W2D1, our day entitled **Macrocircuits**? In Tutorial 3 of that day, the focus was on neural network modularity and we showed you that, compared to a single holistic architecture, having separable modular approaches, each with their own inductive biases, provided a much more efficient mechanism to model complex data. Not only that, but these sub-modules had stronger inductive biases and were easily generalizable to novel inputs. Today, we're also shining a spotlight on the similar idea, but from a much more integrative perspective applied to the grand idea of modeling consciousness. Those of you who are interested should review this tutorial and the ideas on modularity and how it can support complex systems more efficiently than holistic unitary mechanisms.\n",
+    "\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fb33b12",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
     "This idea is closely linked to the field of causal inference, which studies how these separate units or mechanisms cause and influence each other. The goal is to understand and model how these mechanisms work both individually and together. Importantly, these mechanisms often interact only minimally, which means they can keep working properly even if changes occur in other parts. This characteristic makes them very robust, or capable of handling disturbances well.\n",
     "\n",
-    "A specific example from machine learning that uses this idea is called Recurrent Independent Mechanisms (RIMs). In RIMs, different parts of the model mostly work independently, but they can also communicate or \"pay attention\" to each other when it’s necessary. This setup allows for efficient and dynamic processing of information. The research paper available here (https://arxiv.org/pdf/1909.10893) discusses this approach in detail. It highlights the benefits of designing models that recognize and utilize the independence and occasional interactions of these mechanisms. Such models are often more adaptable and can generalize better, meaning they perform well across a variety of different tasks or situations."
+    "A specific example from machine learning that uses this idea is called Recurrent Independent Mechanisms (RIMs). In RIMs, different parts of the model mostly work independently, but they can also communicate or \"pay attention\" to each other when it’s necessary. This setup allows for efficient and dynamic processing of information. The research paper available here (https://arxiv.org/pdf/1909.10893) discusses this approach in detail."
    ]
   },
   {
@@ -1287,7 +1307,7 @@
     "execution": {}
    },
    "source": [
-    "### RIMs\n",
+    "### Recurrent Independent Mechanisms (RIMs)\n",
     "\n",
     "RIM networks are a type of recurrent neural network that process temporal sequences. Inputs are processed one element at a time, the different units of the network process the inputs, a hidden state is updated and propagated through time. RIM networks can thus be used as a drop-in replacement for RNNs like LSTMs or GRUs. The key differences are that:\n",
     "\n",
@@ -1297,26 +1317,26 @@
     "\n",
     "**Selecting the input**\n",
     "\n",
-    "Each RIM unit gets activated and updated when the input is pertinent to it. Using key-value attention, the queries originate from the RIMs, while the keys and values are derived from the current input. The key-value attention mechanisms enable dynamic selection of which variable instance (i.e., which entity or object) will serve as input to each RIM mechanism:\n",
+    "Recall in W1D5 (Microcircuits) we had a tutorial on **Attention** (Tutorial 3), where we covered how modern Transformer-based neural networks implement attention via the Query matrix, the Key matrix and the Value matrix? If not, you might benefit from reviewing the tutorial videos from that day as these concepts are used in the RIM networks we will look at today. Each RIM unit is activated and updated when the input is attended using the attention mechanism. Using key-value attention (KV matrices), the queries (Q matrix) originate from the RIMs, while the keys and values are derived from the current input. In standard deep learning terminology, this is very closely related to the concept of **cross-attention**. The key-value attention mechanisms enable dynamic selection of which variable instance (i.e., which entity or object) will serve as input to each RIM mechanism:\n",
     "\n",
     "$$\n",
     "\\text{Attention}(Q, K, V) = \\text{softmax}\\left(\\frac{Q K^T}{\\sqrt{d}}\\right) V\n",
     "$$\n",
     "\n",
-    "Linear transformations are used to construct keys $K = XW^e $, values $ V = XW^v $ and queries $Q = h_t W^q_k$.\n",
+    "Linear transformations are used to construct keys $K = XW^k $, values $ V = XW^v $ and queries $Q = h_t W^q_i$.\n",
     "\n",
     "Here:\n",
     "\n",
-    "* $ W^v $ is a matrix mapping from an input element to the corresponding value vector for the weighted attention\n",
-    "* $ W^e $ is a weight matrix which maps the input to the keys.\n",
-    "* $ W^q_k $ is a per-RIM weight matrix which maps from the RIM’s hidden state to its queries.\n",
+    "* $ W^k $ is a weight matrix which maps the input to the keys (Key matrix)\n",
+    "* $ W^v $ is a matrix mapping from an input element to the corresponding value vector for the weighted attention (Value matrix)\n",
+    "* $ W^q_i $ is a per-RIM weight matrix which maps from the RIM’s hidden state to its queries (Query matrix)\n",
     "* $h_t$ is the hidden state for a RIM mechanism.\n",
     "\n",
     "\n",
     "$\\oplus$ refers to the row-level concatenation operator. The attention thus is:\n",
     "\n",
     "$$\n",
-    "A^{(\\text{in})}_k = \\text{softmax}\\left(\\frac{h_t W^q_k (XW^e)^T}{\\sqrt{d_e}}\\right) XW^v, \\text{ where } \\theta^{(\\text{in})}_k = (W^q_k, W^e, W^v)\n",
+    "A^{(\\text{in})}_i = \\text{softmax}\\left(\\frac{h_t W^q_i (XW^k)^T}{\\sqrt{d_e}}\\right) XW^v, \\text{ where } \\theta^{(\\text{in})}_i = (W^q_i, W^k, W^v)\n",
     "$$\n",
     "\n",
     "At each step, the top-k RIMs are selected based on their attention scores for the actual input. Essentially, the RIMs compete at each step to read from the input, and only the RIMs that prevail in this competition are allowed to read from the input and update their state."
@@ -1341,10 +1361,10 @@
    "source": [
     "This figure shows how RIMs work over two steps.\n",
     "\n",
-    "- Query generation: each RIM starts by creating a query. This query helps each RIM pull out the necessary information from the data it receives at that moment.\n",
-    "- Attention-based selection: on the right side of the figure, you can see that some RIMs are chosen to be active (colored in blue) and others stay inactive (colored in white). This selection is made using a special scoring system called attention, which picks RIMs based on how relevant they are to the current visual inputs.\n",
-    "- State transition for active RIMs: the RIMs that get activated update their internal states according to their specific rules, using the information they've gathered. The RIMs that aren’t activated don’t change and keep their previous states.\n",
-    "- Communication between RIMs: finally, the active RIMs share information with each other, but this communication is limited. They use a system similar to key-value pairing, which helps them share only the most important information needed for the next step."
+    "- **Query generation**: each RIM starts by creating a query. This query helps each RIM pull out the necessary information from the data it receives at that moment.\n",
+    "- **Attention-based selection**: on the right side of the figure, you can see that some RIMs are chosen to be active (colored in blue) and others stay inactive (colored in white). This selection is made using a special scoring system called attention, which picks RIMs based on how relevant they are to the current visual inputs.\n",
+    "- **State transition for active RIMs**: the RIMs that get activated update their internal states according to their specific rules, using the information they've gathered. The RIMs that aren’t activated don’t change and keep their previous states.\n",
+    "- **Communication between RIMs**: finally, the active RIMs share information with each other, but this communication is limited. They use a system similar to key-value pairing, which helps them share only the most important information needed for the next step."
    ]
   },
   {
@@ -1354,7 +1374,7 @@
     "execution": {}
    },
    "source": [
-    "To make this more concrete, consider the example of modeling the motion of two balls. We can think of each ball as an independent mechanism. Although both balls are affected by Earth’s gravity, and very slightly by each other, they generally move independently. They only interact significantly when they collide. This model captures the essence of independent mechanisms interacting sparsely, a key idea in developing more effective and generalizable AI systems.\n",
+    "To make this more concrete, consider the example of modeling the motion of two balls. We can think of each ball as an independent mechanism. Although both balls are affected by Earth’s gravity, and very slightly from each other, they generally move independently. The only interaction that is significant occurs when they collide. This model captures the essence of independent mechanisms interacting **sparsely**, a key idea in developing more effective and generalizable AI systems (see W1D5 - Tutorial 1 for the tutorial devoted entirely to sparsity and its benefits).\n",
     "\n",
     "Now, let's download the RIM model!"
    ]
@@ -1423,11 +1443,11 @@
     "\n",
     "This is the test setup:\n",
     "\n",
-    "1. Train on 14x14 images of MNIST digits\n",
+    "1. Train on `14x14` images of MNIST digits\n",
     "2. Test on:\n",
-    "    - 16x16 images (validation set 1)\n",
-    "    - 19x19 images (validation set 2)\n",
-    "    - 24x24 images (validation set 3)\n",
+    "    - `16x16` images (validation set 1)\n",
+    "    - `19x19` images (validation set 2)\n",
+    "    - `24x24` images (validation set 3)\n",
     "\n",
     "This approach helps to understand whether the model can still recognize the digits accurately even when they appear at different scales or resolutions than those on which it was originally trained. By testing the model on various image sizes, we can determine how flexible and effective the model is at dealing with variations in input data.\n",
     "\n",
@@ -1655,7 +1675,7 @@
     "execution": {}
    },
    "source": [
-    "The accuracy of the model on 16x16 images is fairly close to what was observed on smaller images, indicating that the increase in size to 16x16 does not significantly impact the model's ability to recognize the images. However, RIMs demonstrate generalization better, when working with the larger 19x19 and 24x24 images - compared to LSTMs."
+    "The accuracy of the model on `16x16` images is fairly close to what was observed on smaller images, indicating that the increase in size to `16x16` does not significantly impact the model's ability to recognize the images. However, RIMs demonstrate generalization better, when working with the larger `19x19` and `24x24` images - compared to LSTMs."
    ]
   },
   {
@@ -1825,9 +1845,9 @@
     "execution": {}
    },
    "source": [
-    "In this section, we explore a deep learning model based on Global Workspace Theory from cognitive neuroscience. You can read more about this model in the linked research paper here (https://arxiv.org/pdf/2103.01197.pdf). The core idea behind this model is the use of a \"shared global workspace\" which serves as a coordination platform for the various specialized modules within the network.\n",
+    "In this section, we explore a deep learning model based on Global Workspace Theory from cognitive neuroscience. You can read more about this model in the linked research paper here (https://arxiv.org/pdf/2103.01197.pdf). The core idea behind this model is the use of a *shared global workspace* which serves as a coordination platform for the various specialized modules within the network.\n",
     "\n",
-    "Essentially, the model incorporates multiple specialist modules, each focusing on different aspects of a problem. Unlike in the RIM mechanism, these modules do not communicate directly with each other, but rather interact through a central shared memory. Communication with the centeral shared memory is handled, once again, by an attention mechanism."
+    "Essentially, the model incorporates multiple specialist modules, each focusing on different aspects of a problem. Unlike in the RIM mechanism, these modules do not communicate *directly* with each other, but rather interact *indirectly* through a central shared memory. Communication with the centeral shared memory is handled, once again, by an attention mechanism."
    ]
   },
   {
@@ -1847,7 +1867,7 @@
     "execution": {}
    },
    "source": [
-    "By centralizing communication this way, the model mimics how a human brain might focus only on the most relevant information at any given time. It mimics a sort of \"cognitive economy,\" where only the most relevant data is processed and shared among modules, reducing redundancy and enhancing the overall performance of the system. Moreover, the theory embeds some of the assumptions of the Global Workspace Theory (GWT) of consciousness, which suggests that consciousness arises from the ability of various brain processes to access a shared information platform, the **Global Workspace**."
+    "By centralizing communication this way, the model mimics how a human brain might focus only on the most relevant information at any given time. It mimics a sort of \"cognitive economy,\" where only the most relevant data is processed and shared among modules, filtered through a bottleneck that forces the model to use a highly efficient, reducing-redundancy useful representation, which enhances the overall performance of the system. Moreover, the theory embeds some of the assumptions of the Global Workspace Theory (GWT) of consciousness, which suggests that consciousness arises from the ability of various brain processes to access a shared information platform, the **Global Workspace**."
    ]
   },
   {
@@ -1867,13 +1887,17 @@
     "execution": {}
    },
    "source": [
-    "In our model, the interaction among the modules (or specialists) and the shared workspace is managed by a key-query-value cross-attention mechanism. Here’s how it works:\n",
+    "In our model, the interaction among the modules (or specialists) and the shared workspace is managed by a QKV cross-attention mechanism, as explained above. \n",
+    "\n",
+    "Here’s how it works:\n",
+    "\n",
+    "- **Key**: Each specialist module generates a key which represents the type of information the module wants to share.\n",
+    "- **Query**: The workspace generates a query at each computational step. This query represents what the workspace needs to know next to facilitate the overall task.\n",
+    "- **Value**: Each specialist also prepares a value, which is the actual information it proposes to add to the workspace.\n",
     "\n",
-    "- Key: Each specialist module generates a key which represents the type of information the module wants to share.\n",
-    "- Query: The workspace generates a query at each computational step. This query represents what the workspace needs to know next to facilitate the overall task.\n",
-    "- Value: Each specialist also prepares a value, which is the actual information it proposes to add to the workspace.\n",
+    "Please refer to the textual explanation above where some of this is defined in a bit more detail if this is still unclear to you.\n",
     "\n",
-    "Fill in the code below to implement this mechanism."
+    "Your task is to fill in the code below to implement this mechanism."
    ]
   },
   {
@@ -2028,7 +2052,9 @@
     "execution": {}
    },
    "source": [
-    "After updating the shared workspace with the most critical signals, this information is then broadcast back to all specialists. Each specialist updates its state using this broadcast information, which can involve an attention mechanism for consolidation and an update function (like an LSTM or GRU step) based on the new combined state. Let's add this method!"
+    "After updating the shared workspace with the most critical signals, this information is then broadcast back to all specialists. Each specialist updates its state using this broadcast information, which can involve an attention mechanism for consolidation and an update function (like an LSTM or GRU step) based on the new combined state. \n",
+    "\n",
+    "Let's add this method!"
    ]
   },
   {
@@ -2203,9 +2229,7 @@
     "execution": {}
    },
    "source": [
-    "Blindsight is a neurological phenomenon where individuals with damage to their primary visual cortex can still respond to visual stimuli without consciously perceiving them.\n",
-    "\n",
-    "\n",
+    "Blindsight is a neurological phenomenon where individuals with damage to their primary visual cortex can still respond to visual stimuli without consciously perceiving them. Megan showed you a video of this from a real patient navigating a corridor and successfully avoiding objects that researchers had strategically placed in his way, which the patient navigated successfully.\n",
     "\n",
     "To study this, we use a simulated dataset that mimics the conditions of blindsight. This dataset contains 400 patterns, equally split between two types:\n",
     "\n",
@@ -2801,11 +2825,11 @@
     "execution": {}
    },
    "source": [
-    "In this section, we'll merge ideas from earlier discussions to present a fresh perspective on how conscious awareness might arise in neural systems. This view comes from higher-order theory, which suggests that consciousness stems from the ability to monitor basic, or first-order, information processing activities, instead of merely broadcasting information globally. This concept agrees with global workspace theories that emphasize the need for a comprehensive monitor that oversees various first-order processes. Moreover, it extends the ideas discussed previously about the role of a second-order network, which helps us understand phenomena like blindsight, where a person can respond to visual stimuli without consciously seeing them.\n",
+    "In this section, we'll merge ideas from earlier discussions to present a fresh perspective on how conscious awareness might arise in neural systems. This view comes from higher-order theory, which suggests that consciousness stems from the ability to monitor basic, or first-order, information processing activities, instead of merely broadcasting information globally. This concept emphasizes the need for a comprehensive monitor that oversees various first-order processes (like GWT). It extends the idea of the role of a second-order network, which helps us understand phenomena like blindsight.\n",
     "\n",
-    "To analyze how our brains handle and update perceptions, we'll operate within a simplified Bayesian framework. This framework helps us evaluate how we perceive reality based on the information we receive. For example, if you hear rustling leaves, your brain calculates the likelihood of it being caused by the wind versus an animal. This calculation involves updating what we initially guess (our prior belief) with new evidence (observed data), resulting in a new, more informed belief (posterior probability).\n",
+    "To analyze how our brains handle and update perceptions, we'll use a Bayesian framework. This framework helps us evaluate how we perceive reality based on the information we receive. For example, if you hear rustling leaves, your brain calculates the likelihood of it being caused by the wind versus an animal. This calculation involves updating what we initially guess (our prior belief) with new evidence (observed data), resulting in a new, more informed belief (posterior probability).\n",
     "\n",
-    "The function below calculates these updated beliefs and uses Kullback-Leibler (KL) divergence to quantify how much the new information changes our understanding. The KL divergence is a way of measuring the 'distance' between the initial belief and your updated belief. In essence, it's measuring how much you have to change your mind given new evidence.\n",
+    "The function below calculates these updated beliefs and uses *Kullback-Leibler (KL)* divergence to quantify how much the new information changes our understanding. The KL divergence is a way of measuring the 'distance' between the initial belief and your updated belief. In essence, it's measuring how much you have to change your mind given new evidence.\n",
     "\n",
     "We base our analysis on a flat, or single-layer, Bayesian network model. This model directly connects our sensory inputs with our perceptual states, simplifying the complex interactions in our brain into a more manageable form. By stripping away the complexities of multi-layered networks, we focus purely on how direct observations impact our consciousness. This simplified approach helps us to better understand the intricate dance between perception and awareness in our neural systems."
    ]
@@ -2853,16 +2877,6 @@
     "    return post_W, KL_W"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "e4cfba4a-b48a-48c5-a554-f03e7096af2e",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "**Make our stimulus space**"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "11ffb999-c213-4400-8f1b-dac5b42ff5e1",
@@ -2870,11 +2884,13 @@
     "execution": {}
    },
    "source": [
-    "The model we are using is grounded in classical \"signal detection theory\", or SDT for short. SDT is in turn a special case of a Bayesian generative model, in which an arbitrary \"evidence\" value is drawn from an unknown distribution, and the task of the observer is to infer which distribution this evidence came from.\n",
+    "### Defining our Stimulus Space\n",
+    "\n",
+    "The model we are using is grounded in classical *Signal Detection Theory* (SDT). SDT is a special case of a Bayesian generative model, in which an arbitrary *evidence* value is drawn from an unknown distribution. The task of the observer is to infer *which distribution* this evidence came from.\n",
     "\n",
-    "In SDT, an observer receives a piece of evidence—this could be any sensory input, like a sound, a light signal, or a statistical data point. The evidence comes from one of several potential distributions. Each distribution represents a different \"state of the world.\" For instance, one distribution might represent the presence of a signal (like a beep), while another might represent just noise. The observer uses Bayesian inference to assess the probability that the received evidence came from one distribution or another. This involves updating their beliefs (probabilities) based on the new evidence. Based on the probabilities calculated through Bayesian inference, the observer decides which distribution most likely produced the evidence.\n",
+    "In SDT, an observer receives a piece of evidence (this could be any sensory input, like a sound, a light signal, or a statistical data point). The evidence comes from one of several potential distributions. Each distribution represents a different *state of the world.* For instance, one distribution might represent the presence of a signal (like a beep), while another might represent just noise. The observer uses Bayesian inference to assess the probability that the received evidence came from one distribution or another. This involves updating their beliefs (probabilities) based on the new evidence. Based on the probabilities calculated through Bayesian inference, the observer decides which distribution most likely produced the evidence.\n",
     "\n",
-    "Let's now imagine we have two categories, A and B - for instance, left- and right-tilted visual stimuli. The sensory \"evidence\" can be written as 2D vector, where the first element is evidence for A, and the second element evidence for B:"
+    "Let's now imagine we have two categories, A and B - for instance, left- and right-tilted visual stimuli. The sensory *evidence* can be written as 2D vector, where the first element is evidence for A, and the second element evidence for B:"
    ]
   },
   {
@@ -3019,7 +3035,7 @@
     "execution": {}
    },
    "source": [
-    "The model partitions the stimuli in the expected way. KL divergence is higher further away from the boundaries, as measuring stimuli far away from the boundaries makes the model rapidly update its beliefs. This is because the model is more certain about the stimuli's class when they are far from the boundaries."
+    "The model partitions the stimuli in the expected way. KL divergence is higher further away from the boundaries, as measuring stimuli far away from the boundaries makes the model rapidly update its beliefs. This is because the model is more *certain* about the stimuli's class when they are far from the boundaries."
    ]
   },
   {
@@ -3029,11 +3045,11 @@
     "execution": {}
    },
    "source": [
-    "**Add in higher-order node for global detection**\n",
+    "#### Add in higher-order node for global detection\n",
     "\n",
-    "So far, our model has been straightforward, or \"flat,\" where each perceptual state (like leftward tilt, rightward tilt, or no stimulus) is treated separately. However, real-life perception often requires more complex judgments about the presence or absence of any stimulus, not just identifying specific types. This is where a higher-order node comes into play.\n",
+    "So far, our model has been straightforward, or *flat*, where each perceptual state (like leftward tilt, rightward tilt, or no stimulus) is treated separately. However, real-life perception often requires more complex judgments about the presence or absence of any stimulus, not just identifying specific types. This is where a higher-order node comes into play.\n",
     "\n",
-    "**Introducing the \"A\" Level:**\n",
+    "#### Introducing the \"A\" Level:\n",
     "\n",
     "Think of the \"A\" level as a kind of overseer or monitor that watches over the lower-level states ($w_1$, $w_2$, etc.). This higher-order node isn't concerned with the specific content of the stimulus (like which direction something is tilting) but rather with whether there's any significant stimulus at all versus just noise. It takes inputs from the same data (pairs of $X$'s), but it adds a layer of awareness. It evaluates whether the data points suggest any meaningful content or if they're likely just random noise.\n",
     "\n",
@@ -3325,8 +3341,11 @@
     "execution": {}
    },
    "source": [
-    "**Simulate ignition (asymmetry vs. symmetry)**\n",
+    "We have included some further details on the notion of ignition. Please feel free to toggle the switch below to learn more. If you're running low on time, then please feel free to run the cell below and come back to this section. The outro video will also cover the broad overview of this concept.\n",
     "\n",
+    "<details>\n",
+    "    <summary>Simulate Ignition (assymetry vs symmetry)</summary>\n",
+    "    \n",
     "The HOSS architecture is designed to detect whether something is there or not. When it detects something, it ends up making more prediction errors in its predictions compared to when it detects nothing. These prediction errors are tracked using a method called Kullback-Leibler (KL) divergence, particularly at a certain level within the model known as the W level.\n",
     "\n",
     "This increase in prediction errors when something is detected is similar to what happens in the human brain, a phenomenon known as global ignition responses. These are big surges in brain activity that happen when we become conscious of something. Research like that conducted by Del Cul et al. (2007) and Dehaene and Changeux (2011) support this concept, linking it to the global workspace model. This model describes consciousness as the sharing of information across different parts of the brain.\n",
@@ -3335,124 +3354,9 @@
     "\n",
     "We then classify these prediction errors based on whether the model recognizes a stimulus as \"seen\" or \"unseen.\" If the model has a response indicating \"seen,\" it shows more activity than when it indicates \"unseen.\" This is what we refer to as ignition — more activity for \"seen\" stimuli.\n",
     "\n",
-    "However, it's crucial to understand that in the HOSS model, these ignition-like responses don't directly cause the global sharing of information in the network. Rather, they are secondary effects or byproducts of other calculations happening within the network. Essentially, these bursts of activity are outcomes of deeper processes in the network, not the direct mechanisms for distributing information throughout the system."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b09f812a-f202-4f3d-ac66-247b322002e7",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Experiment parameters\n",
-    "mu = np.array([[0.5, 0.5], [3.5, 0.5], [0.5, 3.5]])\n",
-    "Nsubjects = 30\n",
-    "Ntrials = 600\n",
-    "cond = np.concatenate((np.ones(Ntrials//3), np.ones(Ntrials//3)*2, np.ones(Ntrials//3)*3))\n",
-    "Wprior = [0.5, 0.5]\n",
-    "Aprior = 0.5\n",
-    "\n",
-    "# Sensory precision values\n",
-    "gamma = np.linspace(0.1, 10, 6)\n",
-    "\n",
-    "# Initialize lists for results\n",
-    "all_KL_w_yes = []\n",
-    "sem_KL_w_yes = []\n",
-    "all_KL_w_no = []\n",
-    "sem_KL_w_no = []\n",
-    "all_KL_A_yes = []\n",
-    "sem_KL_A_yes = []\n",
-    "all_KL_A_no = []\n",
-    "sem_KL_A_no = []\n",
-    "all_prob_y = []\n",
-    "\n",
-    "##############################################################################\n",
-    "## TODO for students: Fill in the missing parts (...)\n",
-    "## Fill in the missing parts to complete the function and remove\n",
-    "raise NotImplementedError(\"Student exercise\")\n",
-    "##############################################################################\n",
-    "\n",
-    "for y in tqdm(..., desc='Processing gammas'):\n",
-    "    Sigma = np.diag([1./np.sqrt(y)]*2)\n",
-    "    mean_KL_w = np.zeros((Nsubjects, 4))\n",
-    "    mean_KL_A = np.zeros((Nsubjects, 4))\n",
-    "    prob_y = np.zeros(Nsubjects)\n",
-    "\n",
-    "    for s in tqdm(range(Nsubjects), desc=f'Subjects for gamma={y}', leave=False):\n",
-    "        KL_w = np.zeros(len(cond))\n",
-    "        KL_A = np.zeros(len(cond))\n",
-    "        posteriorAware = np.zeros(len(cond))\n",
-    "\n",
-    "        # Generate sensory samples\n",
-    "        X = np.array([multivariate_normal.rvs(mean=mu[int(c)-1, :], cov=Sigma) for c in cond])\n",
-    "\n",
-    "        # Model inversion for each trial\n",
-    "        for i, x in enumerate(X):\n",
-    "            post_w, post_A, KL_w[i], KL_A[i] = HOSS_evaluate(x, mu, Sigma, Aprior, Wprior)\n",
-    "            posteriorAware[i] = post_A[1]  # Assuming post_A is a tuple with awareness probability at index 1\n",
-    "\n",
-    "        binaryAware = posteriorAware > 0.5\n",
-    "        for i in range(4):\n",
-    "            conditions = [(cond == 1), (cond != 1), (cond == 1), (cond != 1)]\n",
-    "            aware_conditions = [(binaryAware == 0), (binaryAware == 0), (binaryAware == 1), (binaryAware == 1)]\n",
-    "            mean_KL_w[s, i] = np.mean(KL_w[np.logical_and(aware_conditions[i], conditions[i])])\n",
-    "            mean_KL_A[s, i] = np.mean(KL_A[np.logical_and(aware_conditions[i], conditions[i])])\n",
-    "\n",
-    "        prob_y[s] = np.mean(binaryAware[cond != 1])\n",
-    "\n",
-    "    # Aggregate results across subjects\n",
-    "    all_KL_w_yes.append(np.nanmean(mean_KL_w[:, 2:4].flatten()))\n",
-    "    sem_KL_w_yes.append(np.nanstd(mean_KL_w[:, 2:4].flatten()) / np.sqrt(Nsubjects))\n",
-    "    all_KL_w_no.append(np.nanmean(mean_KL_w[:, :2].flatten()))\n",
-    "    sem_KL_w_no.append(np.nanstd(mean_KL_w[:, :2].flatten()) / np.sqrt(Nsubjects))\n",
-    "    all_KL_A_yes.append(np.nanmean(mean_KL_A[:, 2:4].flatten()))\n",
-    "    sem_KL_A_yes.append(np.nanstd(mean_KL_A[:, 2:4].flatten()) / np.sqrt(Nsubjects))\n",
-    "    all_KL_A_no.append(np.nanmean(mean_KL_A[:, :2].flatten()))\n",
-    "    sem_KL_A_no.append(np.nanstd(mean_KL_A[:, :2].flatten()) / np.sqrt(Nsubjects))\n",
-    "    all_prob_y.append(np.nanmean(prob_y))\n",
-    "\n",
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Create figure\n",
-    "    plt.figure(figsize=(10, 5))\n",
-    "\n",
-    "    # First subplot: Probability of reporting \"seen\" for w_1 or w_2\n",
-    "    plt.subplot(1, 3, 1)\n",
-    "    plt.plot(gamma, all_prob_y, linewidth=2)\n",
-    "    plt.xlabel('Stimulus strength')\n",
-    "    plt.ylabel('Prob. report \"seen\" for w_1 or w_2')\n",
-    "    plt.xticks(fontsize=14)\n",
-    "    plt.yticks(fontsize=14)\n",
-    "    plt.box(False)\n",
-    "\n",
-    "    # Second subplot: K-L divergence, perceptual states\n",
-    "    plt.subplot(1, 3, 2)\n",
-    "    plt.errorbar(gamma, all_KL_w_yes, yerr=sem_KL_w_yes, linewidth=2, label='Seen')\n",
-    "    plt.errorbar(gamma, all_KL_w_no, yerr=sem_KL_w_no, linewidth=2, label='Unseen')\n",
-    "    plt.legend(frameon=False)\n",
-    "    plt.xlabel('Stimulus strength')\n",
-    "    plt.ylabel('KL-divergence, perceptual states')\n",
-    "    plt.xticks(fontsize=14)\n",
-    "    plt.yticks(fontsize=14)\n",
-    "    plt.box(False)\n",
-    "\n",
-    "    # Third subplot: K-L divergence, awareness state\n",
-    "    plt.subplot(1, 3, 3)\n",
-    "    plt.errorbar(gamma, all_KL_A_yes, yerr=sem_KL_A_yes, linewidth=2, label='Seen')\n",
-    "    plt.errorbar(gamma, all_KL_A_no, yerr=sem_KL_A_no, linewidth=2, label='Unseen')\n",
-    "    plt.legend(frameon=False)\n",
-    "    plt.xlabel('Stimulus strength')\n",
-    "    plt.ylabel('KL-divergence, awareness state')\n",
-    "    plt.xticks(fontsize=14)\n",
-    "    plt.yticks(fontsize=14)\n",
-    "    plt.box(False)\n",
+    "However, it's crucial to understand that in the HOSS model, these ignition-like responses don't directly cause the global sharing of information in the network. Rather, they are secondary effects or byproducts of other calculations happening within the network. Essentially, these bursts of activity are outcomes of deeper processes in the network, not the direct mechanisms for distributing information throughout the system.\n",
     "\n",
-    "    # Adjust layout and display the figure\n",
-    "    plt.tight_layout()\n",
-    "    plt.show()"
+    "</details>"
    ]
   },
   {
@@ -3464,8 +3368,6 @@
    },
    "outputs": [],
    "source": [
-    "# to_remove solution\n",
-    "\n",
     "# Experiment parameters\n",
     "mu = np.array([[0.5, 0.5], [3.5, 0.5], [0.5, 3.5]])\n",
     "Nsubjects = 30\n",
@@ -3716,9 +3618,9 @@
    },
    "source": [
     "---\n",
-    "# Summary\n",
+    "# The Big Picture\n",
     "\n",
-    "Hakwan will now discuss the critical aspects and limitations of current consciousness studies, addressing the challenges in distinguishing theories of consciousness from those merely describing general brain functions."
+    "Hakwan will now discuss the critical aspects and limitations of current consciousness studies, addressing the challenges in distinguishing theories of consciousness from those merely describing general brain functions. Join us in the next two videos where we wrap up some of the big ideas and try to put them in context for you!"
    ]
   },
   {
@@ -3876,743 +3778,9 @@
     "execution": {}
    },
    "source": [
-    "Below you'll find some optional coding & discussion bonus content!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f862cbc2-3222-484c-98cb-993f2b591b37",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "---\n",
-    "# Coding Bonus Section\n",
-    "This secton contains some extra coding exercises in case you have time and inclination."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "76dd7488-6558-4022-8541-22765f2967c6",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "## Bonus coding exersice 1: Train a first-order network\n",
-    "\n",
-    "This section invites you to engage with a straightforward, auto-generated dataset on blindsight, originally introduced by [Pasquali et al. in 2010](https://www.sciencedirect.com/science/article/abs/pii/S0010027710001794). Blindsight is a fascinating condition where individuals who are cortically blind due to damage in their primary visual cortex can still respond to visual stimuli without conscious perception. This intriguing phenomenon underscores the intricate nature of sensory processing and the brain's ability to process information without conscious awareness."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8c79b0a2-8e12-44ea-a685-bba788f6685d",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Visualize the autogenerated data\n",
-    "factor=2\n",
-    "initialize_global()\n",
-    "set_pre, _ = create_patterns(0,factor)\n",
-    "plot_signal_max_and_indicator(set_pre.detach().cpu(), \"Example - Pre training dataset\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cff70408-8662-43f5-b930-fc2a6ffca323",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "The pre-training dataset for the network consisted of 200 patterns. These were evenly divided: half were purely noise (with unit activations randomly chosen between 0.0 and 0.02), and the other half represented potential stimuli. In the stimulus patterns, 99 out of 100 units had activations ranging between 0.0 and 0.02, with one unique unit having an activation between 0.0 and 1.0."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0f45662a-08b4-44a4-89e1-200fc0c9cddb",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "**Testing patterns**\n",
-    "\n",
-    "As we have seen before, the network underwent evaluations under three distinct conditions, each modifying the signal-to-noise ratio in a unique way to explore different degrees and types of blindness.\n",
-    "\n",
-    "Suprathreshold stimulus condition: here, the network was exposed to the identical set of 200 patterns used during pre-training, testing the network's response to familiar inputs.\n",
-    "\n",
-    "Subthreshold stimulus condition (blindsight simulation): this condition aimed to mimic blindsight. It was achieved by introducing a slight noise increment (+0.0012) to every input of the first-order network, barring the one designated as the stimulus. This setup tested the network's ability to discern faint signals amidst noise.\n",
-    "\n",
-    "Low vision condition: to simulate low vision, the activation levels of the stimuli were reduced. Unlike the range from 0.0 to 1.0 used in pre-training, the stimuli's activation levels were adjusted to span from 0.0 to 0.3. This condition examined the network's capability to recognize stimuli with diminished intensity."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "db58d78b-17d8-4651-801a-f06e568a7322",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "factor=2\n",
-    "# Compare your results with the patterns generate below\n",
-    "set_1, _ = create_patterns(0,factor)\n",
-    "set_2, _ = create_patterns(1,factor)\n",
-    "set_3, _ = create_patterns(2,factor)\n",
-    "\n",
-    "# Plot\n",
-    "plot_signal_max_and_indicator(set_1.detach().cpu(), \"Suprathreshold dataset\")\n",
-    "plot_signal_max_and_indicator(set_2.detach().cpu(), \"Subthreshold dataset\")\n",
-    "plot_signal_max_and_indicator(set_3.detach().cpu(), \"Low Vision dataset\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cd5c13e0-75e8-45c2-b1be-70496041364b",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "### Activity 1: Building a network for a blindsight situation\n",
-    "\n",
-    "In this activity, we'll construct a neural network model using our auto-generated dataset, focusing on blindsight scenarios. The model will primarily consist of fully connected layers, establishing a straightforward, first-order network. The aim here is to assess the basic network's performance.\n",
-    "\n",
-    "**Steps to follow**\n",
-    "\n",
-    "1. Examine the network architecture: understand the structure of the neural network you're about to work with.\n",
-    "2. Visualize loss metrics: observe and analyze the network's performance during pre-training by visualizing the loss over epochs.\n",
-    "3. Evaluate the model: use the provided code snippets to calculate and interpret the model's accuracy, recall, and F1-score, giving you insight into the network's capabilities.\n",
-    "\n",
-    "**Understanding the process**\n",
-    "\n",
-    "The goal is to gain a thorough comprehension of the network's architecture and to interpret the pre-training results visually. This will provide a clearer picture of the model's potential and limitations.\n",
-    "\n",
-    "The network is designed as a backpropagation autoassociator. It features a 100-unit input layer, directly linked to a 40-unit hidden layer, which in turn connects to a 100-unit output layer. Initial connection weights are set within the range of -1.0 to 1.0 for the first-order network. To mitigate overfitting, dropout is employed within the network architecture. The architecture includes a configurable activation function. This flexibility allows for adjustments and tuning in Activity 3, aiming for optimal model performance."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "94d0bcaf-8b49-4e35-b0d2-1b9dcc98b182",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "class FirstOrderNetwork(nn.Module):\n",
-    "    def __init__(self, hidden_units, data_factor, use_gelu):\n",
-    "        \"\"\"\n",
-    "        Initializes the FirstOrderNetwork with specific configurations.\n",
-    "\n",
-    "        Parameters:\n",
-    "        - hidden_units (int): The number of units in the hidden layer.\n",
-    "        - data_factor (int): Factor to scale the amount of data processed.\n",
-    "                             A factor of 1 indicates the default data amount,\n",
-    "                             while 10 indicates 10 times the default amount.\n",
-    "        - use_gelu (bool): Flag to use GELU (True) or ReLU (False) as the activation function.\n",
-    "        \"\"\"\n",
-    "        super(FirstOrderNetwork, self).__init__()\n",
-    "\n",
-    "        # Define the encoder, hidden, and decoder layers with specified units\n",
-    "\n",
-    "        self.fc1 = nn.Linear(100, hidden_units, bias = False) # Encoder\n",
-    "        self.hidden= nn.Linear(hidden_units, hidden_units, bias = False) # Hidden\n",
-    "        self.fc2 = nn.Linear(hidden_units, 100, bias = False) # Decoder\n",
-    "\n",
-    "        self.relu = nn.ReLU()\n",
-    "        self.sigmoid = nn.Sigmoid()\n",
+    "Tutorial 3 today contains some bonus material based on extensions of what we've covered today. There, you'll find some optional coding & discussion bonus content! Feel free to bookmark and come back to it whenever you are ready. \n",
     "\n",
-    "\n",
-    "        # Dropout layer to prevent overfitting\n",
-    "        self.dropout = nn.Dropout(0.1)\n",
-    "\n",
-    "        # Set the data factor\n",
-    "        self.data_factor = data_factor\n",
-    "\n",
-    "        # Other activation functions for various purposes\n",
-    "        self.softmax = nn.Softmax()\n",
-    "\n",
-    "        # Initialize network weights\n",
-    "        self.initialize_weights()\n",
-    "\n",
-    "    def initialize_weights(self):\n",
-    "        \"\"\"Initializes weights of the encoder, hidden, and decoder layers uniformly.\"\"\"\n",
-    "        init.uniform_(self.fc1.weight, -1.0, 1.0)\n",
-    "\n",
-    "        init.uniform_(self.fc2.weight, -1.0, 1.0)\n",
-    "        init.uniform_(self.hidden.weight, -1.0, 1.0)\n",
-    "\n",
-    "    def encoder(self, x):\n",
-    "      h1 = self.dropout(self.relu(self.fc1(x.view(-1, 100))))\n",
-    "      return h1\n",
-    "\n",
-    "    def decoder(self,z):\n",
-    "      #h2 = self.relu(self.hidden(z))\n",
-    "      h2 = self.sigmoid(self.fc2(z))\n",
-    "      return h2\n",
-    "\n",
-    "\n",
-    "    def forward(self, x):\n",
-    "      \"\"\"\n",
-    "      Defines the forward pass through the network.\n",
-    "\n",
-    "      Parameters:\n",
-    "      - x (Tensor): The input tensor to the network.\n",
-    "\n",
-    "      Returns:\n",
-    "      - Tensor: The output of the network after passing through the layers and activations.\n",
-    "      \"\"\"\n",
-    "      h1 = self.encoder(x)\n",
-    "      h2 = self.decoder(h1)\n",
-    "\n",
-    "      return h1 , h2"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "83e07f1a-540b-4bfa-8e9b-f114319f1f96",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "For now, we will train the first order network only."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7bfade3d-6385-459c-8f07-e3017264455a",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
-    "# Hyperparameters\n",
-    "\n",
-    "# Hyperparameters\n",
-    "global optimizer ,n_epochs , learning_rate_1\n",
-    "learning_rate_1 = 0.5\n",
-    "n_epochs = 100\n",
-    "optimizer=\"ADAMAX\"\n",
-    "hidden=40\n",
-    "factor=2\n",
-    "gelu=False\n",
-    "gam=0.98\n",
-    "meta=True\n",
-    "stepsize=25\n",
-    "initialize_global()\n",
-    "\n",
-    "\n",
-    "# Networks instantiation\n",
-    "first_order_network = FirstOrderNetwork(hidden,factor,gelu).to(device)\n",
-    "second_order_network = SecondOrderNetwork(gelu).to(device) # We define it, but won't use it until activity 3\n",
-    "\n",
-    "# Loss function\n",
-    "criterion_1 = CAE_loss\n",
-    "\n",
-    "# Optimizer\n",
-    "optimizer_1 = optim.Adamax(first_order_network.parameters(), lr=learning_rate_1)\n",
-    "\n",
-    "# Learning rate schedulers\n",
-    "scheduler_1 = StepLR(optimizer_1, step_size=stepsize, gamma=gam)\n",
-    "\n",
-    "max_values_output_first_order = []\n",
-    "max_indices_output_first_order = []\n",
-    "max_values_patterns_tensor = []\n",
-    "max_indices_patterns_tensor = []\n",
-    "\n",
-    "# Training loop\n",
-    "for epoch in range(n_epochs):\n",
-    "    # Generate training patterns and targets for each epoch.\n",
-    "    patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor = generate_patterns(patterns_number, num_units,factor, 0)\n",
-    "\n",
-    "    # Forward pass through the first-order network\n",
-    "    hidden_representation , output_first_order = first_order_network(patterns_tensor)\n",
-    "\n",
-    "    output_first_order=output_first_order.requires_grad_(True)\n",
-    "\n",
-    "    # Skip computations for the second-order network\n",
-    "    with torch.no_grad():\n",
-    "\n",
-    "        # Potentially forward pass through the second-order network without tracking gradients\n",
-    "        output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
-    "\n",
-    "    # Calculate the loss for the first-order network (accuracy of stimulus representation)\n",
-    "    W = first_order_network.state_dict()['fc1.weight']\n",
-    "    loss_1 = criterion_1( W, stim_present_tensor.view(-1, 100), output_first_order,\n",
-    "                        hidden_representation, lam )\n",
-    "    # Backpropagate the first-order network's loss\n",
-    "    loss_1.backward()\n",
-    "\n",
-    "    # Update first-order network weights\n",
-    "    optimizer_1.step()\n",
-    "\n",
-    "    # Reset first-order optimizer gradients to zero for the next iteration\n",
-    "\n",
-    "    # Update the first-order scheduler\n",
-    "    scheduler_1.step()\n",
-    "\n",
-    "    epoch_1_order[epoch] = loss_1.item()\n",
-    "\n",
-    "    # Get max values and indices for output_first_order\n",
-    "    max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
-    "    max_inds_out[max_vals_out == 0] = 0\n",
-    "    max_values_output_first_order.append(max_vals_out.tolist())\n",
-    "    max_indices_output_first_order.append(max_inds_out.tolist())\n",
-    "\n",
-    "    # Get max values and indices for patterns_tensor\n",
-    "    max_vals_pat, max_inds_pat = torch.max(patterns_tensor[100:], dim=1)\n",
-    "    max_inds_pat[max_vals_pat == 0] = 0\n",
-    "    max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
-    "    max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
-    "\n",
-    "\n",
-    "max_values_indices = (max_values_output_first_order[-1],\n",
-    "            max_indices_output_first_order[-1],\n",
-    "            max_values_patterns_tensor[-1],\n",
-    "            max_indices_patterns_tensor[-1])\n",
-    "\n",
-    "\n",
-    "# Plot training loss curve\n",
-    "pre_train_plots(epoch_1_order, epoch_2_order, \"1st & 2nd Order Networks\" , max_value_indices )"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fa7d1ce9-da1b-4f78-b388-47bfcd50c6dd",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "### Testing under 3 blindsight conditions\n",
-    "\n",
-    "We will now use the testing auto-generated datasets from activity 1 to test the network's performance."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2affe162-f4d9-495f-862a-65b0f50ca5ef",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "results_seed=[]\n",
-    "discrimination_seed=[]\n",
-    "\n",
-    "# Prepare networks for testing by calling the configuration function\n",
-    "testing_patterns, n_samples, loaded_model, loaded_model_2 = config_training(first_order_network, second_order_network, hidden, factor, gelu)\n",
-    "\n",
-    "# Perform testing using the defined function and plot the results\n",
-    "f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, loaded_model, loaded_model_2,factor)\n",
-    "\n",
-    "results_seed.append(results_for_plotting)\n",
-    "discrimination_seed.append(discrimination_performances)\n",
-    "# Assuming plot_testing is defined, call it to display results\n",
-    "plot_testing(results_seed,discrimination_seed,  1, \"Seed\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0d18302e-4657-4732-b6ef-f7439d2bb2fd",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_First_order_network\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8a694f1e-3f32-48fc-bce8-0b544d43ca62",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "---\n",
-    "## Bonus coding section 2: Plot surfaces for content / awareness inferences\n",
-    "\n",
-    "To explore the properties of the HOSS model, we can simulate inference at different levels of the hierarchy over the full 2D space of possible input X's. The left panel below shows that the probability of awareness (of any stimulus contents) rises in a graded manner from the lower left corner of the graph (low activation of any feature) to the upper right (high activation of both features). In contrast, the right panel shows that confidence in making a discrimination response (e.g. rightward vs. leftward) increases away from the major diagonal, as the model becomes sure that the sample was generated by either a leftward or rightward tilted stimulus.\n",
-    "\n",
-    "Together, the two surfaces make predictions about the relationships we might see between discrimination confidence and awareness in a simple psychophysics experiment. One notable prediction is that discrimination could still be possible - and lead to some degree of confidence - even when the higher-order node is \"reporting\" unawareness of the stimulus.\n",
-    "\n",
-    "Now, let's get hands on and plot those auto-generated patterns!\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "31503073-a7c0-4502-8d94-5ffa47a22926",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Define the grid\n",
-    "xgrid = np.arange(0, 2.01, 0.01)\n",
-    "\n",
-    "# Define the means for the Gaussian distributions\n",
-    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
-    "\n",
-    "# Define the covariance matrix\n",
-    "Sigma = np.array([[1, 0], [0, 1]])\n",
-    "\n",
-    "# Prior probabilities\n",
-    "Wprior = np.array([0.5, 0.5])\n",
-    "Aprior = 0.5\n",
-    "\n",
-    "# Initialize arrays to hold confidence and posterior probability\n",
-    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
-    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
-    "\n",
-    "# Compute confidence and posterior probability for each point in the grid\n",
-    "for i, xi in tqdm(enumerate(xgrid), total=len(xgrid), desc='Outer Loop'):\n",
-    "    for j, xj in enumerate(xgrid):\n",
-    "        X = [xi, xj]\n",
-    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
-    "        confW[i, j] = max(post_w[1], post_w[2])\n",
-    "        posteriorAware[i, j] = post_A[1]\n",
-    "\n",
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Plotting\n",
-    "    plt.figure(figsize=(10, 5))\n",
-    "\n",
-    "    # Posterior probability \"seen\"\n",
-    "    plt.subplot(1, 2, 1)\n",
-    "    plt.contourf(xgrid, xgrid, posteriorAware.T)\n",
-    "    plt.colorbar()\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('Posterior probability \"seen\"')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    # Confidence in identity\n",
-    "    plt.subplot(1, 2, 2)\n",
-    "    contour_set = plt.contourf(xgrid, xgrid, confW.T)\n",
-    "    plt.colorbar()\n",
-    "    plt.contour(xgrid, xgrid, posteriorAware.T, levels=[0.5], linewidths=4, colors=['white'])  # Line contour for threshold\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('Confidence in identity')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2d129657-62aa-42d1-970a-93fd67736b69",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "**Simulate KL-divergence surfaces**\n",
-    "\n",
-    "We can also simulate KL-divergences (a measure of Bayesian surprise) at each layer in the network, which under predictive coding models of the brain, has been proposed to scale with neural activation (e.g., Friston, 2005; Summerfield & de Lange, 2014)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "66044263-c8de-49a9-a56b-2e7336cc737c",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Define the grid\n",
-    "xgrid = np.arange(0, 2.01, 0.01)\n",
-    "\n",
-    "# Define the means for the Gaussian distributions\n",
-    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
-    "\n",
-    "# Define the covariance matrix\n",
-    "Sigma = np.array([[1, 0], [0, 1]])\n",
-    "\n",
-    "# Prior probabilities\n",
-    "Wprior = np.array([0.5, 0.5])\n",
-    "Aprior = 0.5\n",
-    "\n",
-    "# Initialize arrays to hold confidence and posterior probability\n",
-    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
-    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
-    "\n",
-    "# Compute confidence and posterior probability for each point in the grid\n",
-    "for i, xi in enumerate(xgrid):\n",
-    "    for j, xj in enumerate(xgrid):\n",
-    "        X = [xi, xj]\n",
-    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
-    "\n",
-    "        confW[i, j] = max(post_w[1], post_w[2])\n",
-    "        posteriorAware[i, j] = post_A[1]\n",
-    "\n",
-    "# Calculate the mean K-L divergence for absent and present awareness states\n",
-    "KL_A_absent = np.mean(KL_A[posteriorAware < 0.5])\n",
-    "KL_A_present = np.mean(KL_A[posteriorAware >= 0.5])\n",
-    "KL_w_absent = np.mean(KL_w[posteriorAware < 0.5])\n",
-    "KL_w_present = np.mean(KL_w[posteriorAware >= 0.5])\n",
-    "\n",
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Plotting\n",
-    "    plt.figure(figsize=(18, 6))\n",
-    "\n",
-    "    # K-L divergence, perceptual states\n",
-    "    plt.subplot(1, 2, 1)\n",
-    "    plt.contourf(xgrid, xgrid, KL_w.T, cmap='viridis')\n",
-    "    plt.colorbar()\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('KL-divergence, perceptual states')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    # K-L divergence, awareness state\n",
-    "    plt.subplot(1, 2, 2)\n",
-    "    plt.contourf(xgrid, xgrid, KL_A.T, cmap='viridis')\n",
-    "    plt.colorbar()\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('KL-divergence, awareness state')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b32e4908-0f6f-4259-832f-045adcb19700",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "### Discussion point\n",
-    "\n",
-    "Can you recognise the difference between the KL divergence for the W-level and the one for the A-level?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7d8deb66-9a1d-49e1-a3ef-96970efa8d97",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# to_remove explanation\n",
-    "\"\"\"\n",
-    "At the level of perceptual states W, there is a substantial asymmetry in the KL-divergence expected when the\n",
-    "model says ‘seen’ vs. ‘unseen’ (lefthand panel). This is due to the large belief updates invoked in the\n",
-    "perceptual layer W by samples that deviate from the lower lefthand corner - from absence. In contrast, when\n",
-    "we compute KL-divergence for the A-level (righthand panel), the level of prediction error is symmetric across\n",
-    "seen and unseen decisions, leading to \"hot\" zones both at the upper righthand (present) and lower lefthand\n",
-    "(absent) corners of the 2D space.\n",
-    "\n",
-    "Intuitively, this means that at the W-level, there's a noticeable difference in the KL-divergence values\n",
-    "between \"seen\" and \"unseen\" predictions. This large difference is mainly due to significant updates in the\n",
-    "model's beliefs at this level when the detected samples are far from what is expected under the condition of\n",
-    "\"absence.\" However, when we analyze the K-L divergence at the A-level, the discrepancies in prediction errors\n",
-    "between \"seen\" and \"unseen\" are balanced. This creates equally strong responses in the model, whether something\n",
-    "is detected or not detected.\n",
-    "\n",
-    "We can also sort the KL-divergences as a function of whether the model \"reported\" presence or absence. As\n",
-    "can be seen in the bar plots below, there is more asymmetry in the prediction error at the W compared to the\n",
-    "A levels.\n",
-    "\n",
-    "\"\"\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "869fc8f1-4199-4525-80b3-26e74babc66a",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Create figure with specified size\n",
-    "    plt.figure(figsize=(10, 5))\n",
-    "\n",
-    "    # KL divergence for W states\n",
-    "    plt.subplot(1, 2, 1)\n",
-    "    plt.bar(['unseen', 'seen'], [KL_w_absent, KL_w_present], color='k')\n",
-    "    plt.ylabel('KL divergence, W states')\n",
-    "    plt.xticks(fontsize=18)\n",
-    "    plt.yticks(fontsize=18)\n",
-    "\n",
-    "    # KL divergence for A states\n",
-    "    plt.subplot(1, 2, 2)\n",
-    "    plt.bar(['unseen', 'seen'], [KL_A_absent, KL_A_present], color='k')\n",
-    "    plt.ylabel('KL divergence, A states')\n",
-    "    plt.xticks(fontsize=18)\n",
-    "    plt.yticks(fontsize=18)\n",
-    "\n",
-    "    plt.tight_layout()\n",
-    "\n",
-    "    # Show plot\n",
-    "    plt.show()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "64ecb92c-bfe3-4e49-bd40-f11ffa685ece",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_HOSS_Bonus_Content\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bcd87344-d473-44af-a881-b68e5471d353",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "---\n",
-    "# Discussion Bonus Section\n",
-    "This section contains an extra discussion exercise if you have time and inclination."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ca33829c-8d54-437e-ba33-d3003af51d7a",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "In this bonus section, Megan and Anil will delve into the complexities of defining and testing for consciousness, particularly in the context of artificial intelligence. We will explore various theoretical perspectives, examine classic and contemporary tests for consciousness, and discuss the challenges and ethical implications of determining whether a system truly possesses conscious experience."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "21b621ea-1639-4131-8ec3-9cdf34a64f77",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Video 11: Consciousness Bonus Content\n",
-    "\n",
-    "from ipywidgets import widgets\n",
-    "from IPython.display import YouTubeVideo\n",
-    "from IPython.display import IFrame\n",
-    "from IPython.display import display\n",
-    "\n",
-    "class PlayVideo(IFrame):\n",
-    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
-    "    self.id = id\n",
-    "    if source == 'Bilibili':\n",
-    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
-    "    elif source == 'Osf':\n",
-    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
-    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
-    "\n",
-    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
-    "  tab_contents = []\n",
-    "  for i, video_id in enumerate(video_ids):\n",
-    "    out = widgets.Output()\n",
-    "    with out:\n",
-    "      if video_ids[i][0] == 'Youtube':\n",
-    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
-    "                             height=H, fs=fs, rel=0)\n",
-    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
-    "      else:\n",
-    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
-    "                          height=H, fs=fs, autoplay=False)\n",
-    "        if video_ids[i][0] == 'Bilibili':\n",
-    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
-    "        elif video_ids[i][0] == 'Osf':\n",
-    "          print(f'Video available at https://osf.io/{video.id}')\n",
-    "      display(video)\n",
-    "    tab_contents.append(out)\n",
-    "  return tab_contents\n",
-    "\n",
-    "video_ids = [('Youtube', '00dL8q7WgcU'), ('Bilibili', 'BV12n4y1Q7C2')]\n",
-    "tab_contents = display_videos(video_ids, W=854, H=480)\n",
-    "tabs = widgets.Tab()\n",
-    "tabs.children = tab_contents\n",
-    "for i in range(len(tab_contents)):\n",
-    "  tabs.set_title(i, video_ids[i][0])\n",
-    "display(tabs)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "39c202fb-f580-4a96-8f8e-bad24ed1d55c",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_Video_11\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e9e839f2-e237-4ed4-9045-56dc7b5f6d60",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "## Discussion activity: Is it actually conscious?"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2720c0b5-6386-43a6-9647-f1245531c376",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "We discussed the difference between these two...\n",
-    "- \"Forward\" tests: passing means the machine is conscious (or intelligent).\n",
-    "- \"Reverse\" tests: passing means humans are convinced that a machine is conscious (or intelligent).\n",
-    "\n",
-    "**Discuss!** If a system (AI, other animal, other human) exhibited all the \"right signs\" of being conscious, how can we know for sure it is actually conscious? How could you design a test to be a true forward test?\n",
-    "\n",
-    "- Room 1: I think you could design a forward test in this way... [share your ideas]\n",
-    "- Room 2: I think a forward test is impossible, and here's why [share your ideas]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "84958157-c165-4cc3-be76-408999cf44ad",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_Discussion_activity\")"
+    "For the moment, let's switch to our second topic on the day, arguably one of the most important topics we've covered so far: Ethics."
    ]
   }
  ],
@@ -4644,7 +3812,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.22"
   }
  },
  "nbformat": 4,
diff --git a/tutorials/W2D5_Mysteries/W2D5_Tutorial2.ipynb b/tutorials/W2D5_Mysteries/W2D5_Tutorial2.ipynb
index ad486f06a..099132da9 100644
--- a/tutorials/W2D5_Mysteries/W2D5_Tutorial2.ipynb
+++ b/tutorials/W2D5_Mysteries/W2D5_Tutorial2.ipynb
@@ -25,9 +25,9 @@
     "\n",
     "__Content creators:__ Megan Peters, Joshua Shepherd, Jana Schaich Borg\n",
     "\n",
-    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang\n",
+    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Alex Murphy\n",
     "\n",
-    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk\n"
+    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Alex Murphy\n"
    ]
   },
   {
@@ -542,7 +542,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.22"
   }
  },
  "nbformat": 4,
diff --git a/tutorials/W2D5_Mysteries/W2D5_Tutorial3.ipynb b/tutorials/W2D5_Mysteries/W2D5_Tutorial3.ipynb
new file mode 100644
index 000000000..c38b65e0a
--- /dev/null
+++ b/tutorials/W2D5_Mysteries/W2D5_Tutorial3.ipynb
@@ -0,0 +1,2404 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "89a00b06-154b-4aaf-8bee-b96a675406b5",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a> &nbsp; <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb\"  target=\"_parent\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open in Kaggle\"/></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82ed61a3-87d2-4e76-83f6-4b786c101af2",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "# (Bonus) Tutorial 3: Consciousness (Extended)\n",
+    "\n",
+    "**Week 2, Day 5: Mysteries**\n",
+    "\n",
+    "**By Neuromatch Academy**\n",
+    "\n",
+    "__Content creators:__ Steve Fleming, Guillaume Dumas, Samuele Bolotta, Juan David Vargas, Hakwan Lau, Anil Seth, Megan Peters\n",
+    "\n",
+    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Patrick Mineault, Alex Murphy\n",
+    "\n",
+    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Patrick Mineault, Alex Murphy\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7861818a",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Install and import feedback gadget\n",
+    "\n",
+    "!pip install vibecheck numpy matplotlib Pillow torch torchvision transformers ipywidgets gradio trdg scikit-learn networkx pickleshare seaborn tabulate --quiet\n",
+    "\n",
+    "from vibecheck import DatatopsContentReviewContainer\n",
+    "def content_review(notebook_section: str):\n",
+    "    return DatatopsContentReviewContainer(\n",
+    "        \"\",  # No text prompt\n",
+    "        notebook_section,\n",
+    "        {\n",
+    "            \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
+    "            \"name\": \"neuromatch_neuroai\",\n",
+    "            \"user_key\": \"wb2cxze8\",\n",
+    "        },\n",
+    "    ).render()\n",
+    "\n",
+    "feedback_prefix = \"W2D5_T3\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c4e3a7d",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Import dependencies\n",
+    "# @markdown\n",
+    "\n",
+    "import contextlib\n",
+    "import io\n",
+    "\n",
+    "with contextlib.redirect_stdout(io.StringIO()):\n",
+    "    # Standard Libraries\n",
+    "    import copy\n",
+    "    import logging\n",
+    "    import os\n",
+    "    import random\n",
+    "    import requests\n",
+    "\n",
+    "    # Data Handling and Visualization Libraries\n",
+    "    import numpy as np\n",
+    "    import pandas as pd\n",
+    "    import matplotlib.pyplot as plt\n",
+    "    import seaborn as sns\n",
+    "    from sklearn.metrics import precision_score, recall_score, fbeta_score\n",
+    "    from sklearn.linear_model import LinearRegression\n",
+    "    from tabulate import tabulate\n",
+    "\n",
+    "    # Scientific Computing and Statistical Libraries\n",
+    "    from numpy.linalg import inv\n",
+    "    from scipy.special import logsumexp\n",
+    "    from scipy.stats import multivariate_normal\n",
+    "\n",
+    "    # Deep Learning Libraries\n",
+    "    import torch\n",
+    "    from torch import nn, optim, save, load\n",
+    "    from torch.nn import functional as F\n",
+    "    from torch.utils.data import DataLoader\n",
+    "    import torch.nn.init as init\n",
+    "    from torch.optim.lr_scheduler import StepLR\n",
+    "\n",
+    "    # Image Processing Libraries\n",
+    "    from PIL import Image\n",
+    "    from matplotlib.patches import Patch\n",
+    "    from mpl_toolkits.mplot3d import Axes3D\n",
+    "\n",
+    "    # Interactive Elements and Web Applications\n",
+    "    from IPython.display import IFrame\n",
+    "    from IPython.display import Image as IMG\n",
+    "    import gradio as gr\n",
+    "    import ipywidgets as widgets\n",
+    "    from ipywidgets import interact, IntSlider\n",
+    "\n",
+    "    # Graph Analysis Libraries\n",
+    "    import networkx as nx\n",
+    "\n",
+    "    # Progress Monitoring Libraries\n",
+    "    from tqdm import tqdm\n",
+    "\n",
+    "    # Utilities and Miscellaneous Libraries\n",
+    "    from itertools import product\n",
+    "\n",
+    "    import math\n",
+    "    !pip install torch_optimizer\n",
+    "    import torch_optimizer as optim2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "00f889a6",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Figure settings\n",
+    "# @markdown\n",
+    "\n",
+    "logging.getLogger('matplotlib.font_manager').disabled = True\n",
+    "\n",
+    "%matplotlib inline\n",
+    "%config InlineBackend.figure_format = 'retina' # perfrom high definition rendering for images and plots\n",
+    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "98ca7c55",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Set device (GPU or CPU)\n",
+    "\n",
+    "def set_device():\n",
+    "    \"\"\"\n",
+    "    Determines and sets the computational device for PyTorch operations based on the availability of a CUDA-capable GPU.\n",
+    "\n",
+    "    Outputs:\n",
+    "    - device (str): The device that PyTorch will use for computations ('cuda' or 'cpu'). This string can be directly used\n",
+    "    in PyTorch operations to specify the device.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
+    "    if device != \"cuda\":\n",
+    "        print(\"GPU is not enabled in this notebook. \\n\"\n",
+    "              \"If you want to enable it, in the menu under `Runtime` -> \\n\"\n",
+    "              \"`Hardware accelerator.` and select `GPU` from the dropdown menu\")\n",
+    "    else:\n",
+    "        print(\"GPU is enabled in this notebook. \\n\"\n",
+    "              \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n",
+    "              \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n",
+    "\n",
+    "    return device\n",
+    "\n",
+    "device = set_device()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2508d8b4",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Helper functions\n",
+    "\n",
+    "mse_loss = nn.BCELoss(size_average = False)\n",
+    "\n",
+    "lam = 1e-4\n",
+    "\n",
+    "from torch.autograd import Variable\n",
+    "\n",
+    "def CAE_loss(W, x, recons_x, h, lam):\n",
+    "    \"\"\"Compute the Contractive AutoEncoder Loss\n",
+    "\n",
+    "    Evalutes the CAE loss, which is composed as the summation of a Mean\n",
+    "    Squared Error and the weighted l2-norm of the Jacobian of the hidden\n",
+    "    units with respect to the inputs.\n",
+    "\n",
+    "\n",
+    "    See reference below for an in-depth discussion:\n",
+    "      #1: http://wiseodd.github.io/techblog/2016/12/05/contractive-autoencoder\n",
+    "\n",
+    "    Args:\n",
+    "        `W` (FloatTensor): (N_hidden x N), where N_hidden and N are the\n",
+    "          dimensions of the hidden units and input respectively.\n",
+    "        `x` (Variable): the input to the network, with dims (N_batch x N)\n",
+    "        recons_x (Variable): the reconstruction of the input, with dims\n",
+    "          N_batch x N.\n",
+    "        `h` (Variable): the hidden units of the network, with dims\n",
+    "          batch_size x N_hidden\n",
+    "        `lam` (float): the weight given to the jacobian regulariser term\n",
+    "\n",
+    "    Returns:\n",
+    "        Variable: the (scalar) CAE loss\n",
+    "    \"\"\"\n",
+    "    mse = mse_loss(recons_x, x)\n",
+    "    # Since: W is shape of N_hidden x N. So, we do not need to transpose it as\n",
+    "    # opposed to #1\n",
+    "    dh = h * (1 - h) # Hadamard product produces size N_batch x N_hidden\n",
+    "    # Sum through the input dimension to improve efficiency, as suggested in #1\n",
+    "    w_sum = torch.sum(Variable(W)**2, dim=1)\n",
+    "    # unsqueeze to avoid issues with torch.mv\n",
+    "    w_sum = w_sum.unsqueeze(1) # shape N_hidden x 1\n",
+    "    contractive_loss = torch.sum(torch.mm(dh**2, w_sum), 0)\n",
+    "    return mse + contractive_loss.mul_(lam)\n",
+    "\n",
+    "class FirstOrderNetwork(nn.Module):\n",
+    "    def __init__(self, hidden_units, data_factor, use_gelu):\n",
+    "        \"\"\"\n",
+    "        Initializes the FirstOrderNetwork with specific configurations.\n",
+    "\n",
+    "        Parameters:\n",
+    "        - hidden_units (int): The number of units in the hidden layer.\n",
+    "        - data_factor (int): Factor to scale the amount of data processed.\n",
+    "                             A factor of 1 indicates the default data amount,\n",
+    "                             while 10 indicates 10 times the default amount.\n",
+    "        - use_gelu (bool): Flag to use GELU (True) or ReLU (False) as the activation function.\n",
+    "        \"\"\"\n",
+    "        super(FirstOrderNetwork, self).__init__()\n",
+    "\n",
+    "        # Define the encoder, hidden, and decoder layers with specified units\n",
+    "\n",
+    "        self.fc1 = nn.Linear(100, hidden_units, bias = False) # Encoder\n",
+    "        self.hidden= nn.Linear(hidden_units, hidden_units, bias = False) # Hidden\n",
+    "        self.fc2 = nn.Linear(hidden_units, 100, bias = False) # Decoder\n",
+    "\n",
+    "        self.relu = nn.ReLU()\n",
+    "        self.sigmoid = nn.Sigmoid()\n",
+    "\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting\n",
+    "        self.dropout = nn.Dropout(0.1)\n",
+    "\n",
+    "        # Set the data factor\n",
+    "        self.data_factor = data_factor\n",
+    "\n",
+    "        # Other activation functions for various purposes\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize network weights\n",
+    "        self.initialize_weights()\n",
+    "\n",
+    "    def initialize_weights(self):\n",
+    "        \"\"\"Initializes weights of the encoder, hidden, and decoder layers uniformly.\"\"\"\n",
+    "        init.uniform_(self.fc1.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.fc2.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.hidden.weight, -1.0, 1.0)\n",
+    "\n",
+    "    def encoder(self, x):\n",
+    "      h1 = self.dropout(self.relu(self.fc1(x.view(-1, 100))))\n",
+    "      return h1\n",
+    "\n",
+    "    def decoder(self,z):\n",
+    "      #h2 = self.relu(self.hidden(z))\n",
+    "      h2 = self.sigmoid(self.fc2(z))\n",
+    "      return h2\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "      \"\"\"\n",
+    "      Defines the forward pass through the network.\n",
+    "\n",
+    "      Parameters:\n",
+    "      - x (Tensor): The input tensor to the network.\n",
+    "\n",
+    "      Returns:\n",
+    "      - Tensor: The output of the network after passing through the layers and activations.\n",
+    "      \"\"\"\n",
+    "      h1 = self.encoder(x)\n",
+    "      h2 = self.decoder(h1)\n",
+    "\n",
+    "      return h1 , h2\n",
+    "\n",
+    "class SecondOrderNetwork(nn.Module):\n",
+    "    def __init__(self, use_gelu):\n",
+    "        super(SecondOrderNetwork, self).__init__()\n",
+    "        # Define a linear layer for comparing the difference between input and output of the first-order network\n",
+    "        self.comparison_layer = nn.Linear(100, 100)\n",
+    "\n",
+    "        # Linear layer for determining wagers, mapping from 100 features to a single output\n",
+    "        self.wager = nn.Linear(100, 1)\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting by randomly setting input units to 0 with a probability of 0.5 during training\n",
+    "        self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "        # Select activation function based on the `use_gelu` flag\n",
+    "        self.activation = torch.relu\n",
+    "\n",
+    "        # Additional activation functions for potential use in network operations\n",
+    "        self.sigmoid = torch.sigmoid\n",
+    "\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize the weights of the network\n",
+    "        self._init_weights()\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def forward(self, first_order_input, first_order_output):\n",
+    "        # Calculate the difference between the first-order input and output\n",
+    "        comparison_matrix = first_order_input - first_order_output\n",
+    "\n",
+    "        #Another option is to directly calculate the per unit MSE to use as input for the comparator matrix\n",
+    "        #comparison_matrix = nn.MSELoss(reduction='none')(first_order_output, first_order_input)\n",
+    "\n",
+    "        # Pass the difference through the comparison layer and apply the chosen activation function\n",
+    "        comparison_out=self.dropout(self.activation(self.comparison_layer(comparison_matrix)))\n",
+    "\n",
+    "        # Calculate the wager value, applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        wager = self.sigmoid(self.wager(comparison_out))\n",
+    "\n",
+    "        return wager\n",
+    "\n",
+    "def initialize_global():\n",
+    "    global Input_Size_1, Hidden_Size_1, Output_Size_1, Input_Size_2\n",
+    "    global num_units, patterns_number\n",
+    "    global learning_rate_2, momentum, temperature , Threshold\n",
+    "    global First_set, Second_set, Third_set\n",
+    "    global First_set_targets, Second_set_targets, Third_set_targets\n",
+    "    global epoch_list, epoch_1_order, epoch_2_order, patterns_matrix1\n",
+    "    global testing_graph_names\n",
+    "\n",
+    "    global optimizer ,n_epochs , learning_rate_1\n",
+    "    learning_rate_1 = 0.5\n",
+    "    n_epochs = 100\n",
+    "    optimizer=\"ADAMAX\"\n",
+    "\n",
+    "    # Network sizes\n",
+    "    Input_Size_1 = 100\n",
+    "    Hidden_Size_1 = 60\n",
+    "    Output_Size_1 = 100\n",
+    "    Input_Size_2 = 100\n",
+    "\n",
+    "    # Patterns\n",
+    "    num_units = 100\n",
+    "    patterns_number = 200\n",
+    "\n",
+    "    # Pre-training and hyperparameters\n",
+    "    learning_rate_2 = 0.1\n",
+    "    momentum = 0.9\n",
+    "    temperature = 1.0\n",
+    "    Threshold=0.5\n",
+    "\n",
+    "    # Testing\n",
+    "    First_set = []\n",
+    "    Second_set = []\n",
+    "    Third_set = []\n",
+    "    First_set_targets = []\n",
+    "    Second_set_targets = []\n",
+    "    Third_set_targets = []\n",
+    "\n",
+    "    # Graphic of pretraining\n",
+    "    epoch_list = list(range(1, n_epochs + 1))\n",
+    "    epoch_1_order = np.zeros(n_epochs)\n",
+    "    epoch_2_order = np.zeros(n_epochs)\n",
+    "    patterns_matrix1 =  torch.zeros((n_epochs, patterns_number), device=device)  # Initialize patterns_matrix as a PyTorch tensor on the GPU\n",
+    "\n",
+    "def compute_metrics(TP, TN, FP, FN):\n",
+    "    \"\"\"Compute precision, recall, F1 score, and accuracy.\"\"\"\n",
+    "    precision = round(TP / (TP + FP), 2) if (TP + FP) > 0 else 0\n",
+    "    recall = round(TP / (TP + FN), 2) if (TP + FN) > 0 else 0\n",
+    "    f1_score = round(2 * (precision * recall) / (precision + recall), 2) if (precision + recall) > 0 else 0\n",
+    "    accuracy = round((TP + TN) / (TP + TN + FP + FN), 2) if (TP + TN + FP + FN) > 0 else 0\n",
+    "    return precision, recall, f1_score, accuracy\n",
+    "\n",
+    "# define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "def prepare_pre_training(hidden,factor,gelu,stepsize, gam):\n",
+    "\n",
+    "  first_order_network = FirstOrderNetwork(hidden, factor, gelu).to(device)\n",
+    "  second_order_network = SecondOrderNetwork(gelu).to(device)\n",
+    "\n",
+    "  criterion_1 = CAE_loss\n",
+    "  criterion_2 = nn.BCELoss(size_average = False)\n",
+    "\n",
+    "\n",
+    "  if optimizer == \"ADAM\":\n",
+    "    optimizer_1 = optim.Adam(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.Adam(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"SGD\":\n",
+    "    optimizer_1 = optim.SGD(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.SGD(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"SWATS\":\n",
+    "    optimizer_1 = optim2.SWATS(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim2.SWATS(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"ADAMW\":\n",
+    "    optimizer_1 = optim.AdamW(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.AdamW(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"RMS\":\n",
+    "    optimizer_1 = optim.RMSprop(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.RMSprop(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"ADAMAX\":\n",
+    "    optimizer_1 = optim.Adamax(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.Adamax(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  # Learning rate schedulers\n",
+    "  scheduler_1 = StepLR(optimizer_1, step_size=stepsize, gamma=gam)\n",
+    "  scheduler_2 = StepLR(optimizer_2, step_size=stepsize, gamma=gam)\n",
+    "\n",
+    "  return first_order_network, second_order_network, criterion_1 , criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2\n",
+    "\n",
+    "def title(string):\n",
+    "    # Enable XKCD plot styling\n",
+    "    with plt.xkcd():\n",
+    "        # Create a figure and an axes.\n",
+    "        fig, ax = plt.subplots()\n",
+    "\n",
+    "        # Create a rectangle patch with specified dimensions and styles\n",
+    "        rectangle = patches.Rectangle((0.05, 0.1), 0.9, 0.4, linewidth=1, edgecolor='r', facecolor='blue', alpha=0.5)\n",
+    "        ax.add_patch(rectangle)\n",
+    "\n",
+    "        # Place text inside the rectangle, centered\n",
+    "        plt.text(0.5, 0.3, string, horizontalalignment='center', verticalalignment='center', fontsize=26, color='white')\n",
+    "\n",
+    "        # Set plot limits\n",
+    "        ax.set_xlim(0, 1)\n",
+    "        ax.set_ylim(0, 1)\n",
+    "\n",
+    "        # Disable axis display\n",
+    "        ax.axis('off')\n",
+    "\n",
+    "        # Display the plot\n",
+    "        plt.show()\n",
+    "\n",
+    "        # Close the figure to free up memory\n",
+    "        plt.close(fig)\n",
+    "\n",
+    "# Function to configure the training environment and load the models\n",
+    "def get_test_patterns(factor):\n",
+    "    \"\"\"\n",
+    "    Configures the training environment by saving the state of the given models and loading them back.\n",
+    "    Initializes testing patterns for evaluation.\n",
+    "\n",
+    "    Returns:\n",
+    "    - Tuple of testing patterns, number of samples in the testing patterns\n",
+    "    \"\"\"\n",
+    "    # Generating testing patterns for three different sets\n",
+    "    first_set, first_set_targets = create_patterns(0,factor)\n",
+    "    second_set, second_set_targets = create_patterns(1,factor)\n",
+    "    third_set, third_set_targets = create_patterns(2,factor)\n",
+    "\n",
+    "    # Aggregate testing patterns and their targets for ease of access\n",
+    "    testing_patterns = [[first_set, first_set_targets], [second_set, second_set_targets], [third_set, third_set_targets]]\n",
+    "\n",
+    "    # Determine the number of samples from the first set (assumed consistent across all sets)\n",
+    "    n_samples = len(testing_patterns[0][0])\n",
+    "\n",
+    "    return testing_patterns, n_samples\n",
+    "\n",
+    "# Function to test the model using the configured testing patterns\n",
+    "def plot_input_output(input_data, output_data, index):\n",
+    "    fig, axes = plt.subplots(1, 2, figsize=(10, 6))\n",
+    "\n",
+    "    # Plot input data\n",
+    "    im1 = axes[0].imshow(input_data.cpu().numpy(), aspect='auto', cmap='viridis')\n",
+    "    axes[0].set_title('Input')\n",
+    "    fig.colorbar(im1, ax=axes[0])\n",
+    "\n",
+    "    # Plot output data\n",
+    "    im2 = axes[1].imshow(output_data.cpu().numpy(), aspect='auto', cmap='viridis')\n",
+    "    axes[1].set_title('Output')\n",
+    "    fig.colorbar(im2, ax=axes[1])\n",
+    "\n",
+    "    plt.suptitle(f'Testing Pattern {index+1}')\n",
+    "    plt.show()\n",
+    "\n",
+    "# Function to test the model using the configured testing patterns\n",
+    "# Function to test the model using the configured testing patterns\n",
+    "def testing(testing_patterns, n_samples, loaded_model, loaded_model_2,factor):\n",
+    "\n",
+    "    def generate_chance_level(shape):\n",
+    "      chance_level = np.random.rand(*shape).tolist()\n",
+    "      return chance_level\n",
+    "\n",
+    "    results_for_plotting = []\n",
+    "    max_values_output_first_order = []\n",
+    "    max_indices_output_first_order = []\n",
+    "    max_values_patterns_tensor = []\n",
+    "    max_indices_patterns_tensor = []\n",
+    "    f1_scores_wager = []\n",
+    "\n",
+    "    mse_losses_indices = []\n",
+    "    mse_losses_values = []\n",
+    "    discrimination_performances = []\n",
+    "\n",
+    "\n",
+    "\n",
+    "    # Iterate through each set of testing patterns and targets\n",
+    "    for i in range(len(testing_patterns)):\n",
+    "        with torch.no_grad():  # Ensure no gradients are computed during testing\n",
+    "\n",
+    "            #For low vision the stimulus threshold was set to 0.3 as can seen in the generate_patters function\n",
+    "            threshold=0.5\n",
+    "            if i==2:\n",
+    "                threshold=0.15\n",
+    "\n",
+    "            # Obtain output from the first order model\n",
+    "            input_data = testing_patterns[i][0]\n",
+    "            hidden_representation,  output_first_order = loaded_model(input_data)\n",
+    "            output_second_order = loaded_model_2(input_data, output_first_order)\n",
+    "\n",
+    "            delta=100*factor\n",
+    "\n",
+    "            print(\"driscriminator\")\n",
+    "            print((output_first_order[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean())\n",
+    "            discrimination_performance = round((output_first_order[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean().item(), 2)\n",
+    "            discrimination_performances.append(discrimination_performance)\n",
+    "\n",
+    "\n",
+    "            chance_level = torch.Tensor( generate_chance_level((200*factor,100))).to(device)\n",
+    "            discrimination_random= round((chance_level[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean().item(), 2)\n",
+    "            print(\"chance level\" , discrimination_random)\n",
+    "\n",
+    "\n",
+    "\n",
+    "            #count all patterns in the dataset\n",
+    "            wagers = output_second_order[delta:].cpu()\n",
+    "\n",
+    "            _, targets_2 = torch.max(testing_patterns[i][1], 1)\n",
+    "            targets_2 = targets_2[delta:].cpu()\n",
+    "\n",
+    "            # Convert targets to binary classification for wagering scenario\n",
+    "            targets_2 = (targets_2 > 0).int()\n",
+    "\n",
+    "            # Convert tensors to NumPy arrays for metric calculations\n",
+    "            predicted_np = wagers.numpy().flatten()\n",
+    "            targets_2_np = targets_2.numpy()\n",
+    "\n",
+    "            #print(\"number of targets,\" , len(targets_2_np))\n",
+    "\n",
+    "            print(predicted_np)\n",
+    "            print(targets_2_np)\n",
+    "\n",
+    "            # Calculate True Positives, True Negatives, False Positives, and False Negatives\n",
+    "            TP = np.sum((predicted_np >  threshold) & (targets_2_np > threshold))\n",
+    "            TN = np.sum((predicted_np <  threshold ) & (targets_2_np < threshold))\n",
+    "            FP = np.sum((predicted_np >  threshold) & (targets_2_np <  threshold))\n",
+    "            FN = np.sum((predicted_np <  threshold) & (targets_2_np >  threshold))\n",
+    "\n",
+    "            # Compute precision, recall, F1 score, and accuracy for both high and low wager scenarios\n",
+    "            precision_h, recall_h, f1_score_h, accuracy_h = compute_metrics(TP, TN, FP, FN)\n",
+    "\n",
+    "            f1_scores_wager.append(f1_score_h)\n",
+    "\n",
+    "            # Collect results for plotting\n",
+    "            results_for_plotting.append({\n",
+    "                \"counts\": [[TP, FP, TP + FP]],\n",
+    "                \"metrics\": [[precision_h, recall_h, f1_score_h, accuracy_h]],\n",
+    "                \"title_results\": f\"Results Table - Set {i+1}\",\n",
+    "                \"title_metrics\": f\"Metrics Table - Set {i+1}\"\n",
+    "            })\n",
+    "\n",
+    "            # Plot input and output of the first-order network\n",
+    "            plot_input_output(input_data, output_first_order, i)\n",
+    "\n",
+    "            max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
+    "            max_inds_out[max_vals_out == 0] = 0\n",
+    "            max_values_output_first_order.append(max_vals_out.tolist())\n",
+    "            max_indices_output_first_order.append(max_inds_out.tolist())\n",
+    "\n",
+    "            max_vals_pat, max_inds_pat = torch.max(input_data[100:], dim=1)\n",
+    "            max_inds_pat[max_vals_pat == 0] = 0\n",
+    "            max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
+    "            max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
+    "\n",
+    "            fig, axs = plt.subplots(1, 2, figsize=(15, 5))\n",
+    "\n",
+    "            # Scatter plot of indices: patterns_tensor vs. output_first_order\n",
+    "            axs[0].scatter(max_indices_patterns_tensor[i], max_indices_output_first_order[i], alpha=0.5)\n",
+    "            axs[0].set_title(f'Stimuli location: Condition {i+1} - First Order Input vs. First Order Output')\n",
+    "            axs[0].set_xlabel('First Order Input Indices')\n",
+    "            axs[0].set_ylabel('First Order Output Indices')\n",
+    "\n",
+    "            # Add quadratic fit to scatter plot\n",
+    "            x_indices = max_indices_patterns_tensor[i]\n",
+    "            y_indices = max_indices_output_first_order[i]\n",
+    "            y_pred_indices = perform_quadratic_regression(x_indices, y_indices)\n",
+    "            axs[0].plot(x_indices, y_pred_indices, color='skyblue')\n",
+    "\n",
+    "\n",
+    "            # Calculate MSE loss for indices\n",
+    "            mse_loss_indices = np.mean((np.array(x_indices) - np.array(y_indices)) ** 2)\n",
+    "            mse_losses_indices.append(mse_loss_indices)\n",
+    "\n",
+    "            # Scatter plot of values: patterns_tensor vs. output_first_order\n",
+    "            axs[1].scatter(max_values_patterns_tensor[i], max_values_output_first_order[i], alpha=0.5)\n",
+    "            axs[1].set_title(f'Stimuli Values: Condition {i+1} - First Order Input vs. First Order Output')\n",
+    "            axs[1].set_xlabel('First Order Input Values')\n",
+    "            axs[1].set_ylabel('First Order Output Values')\n",
+    "\n",
+    "            # Add quadratic fit to scatter plot\n",
+    "            x_values = max_values_patterns_tensor[i]\n",
+    "            y_values = max_values_output_first_order[i]\n",
+    "            y_pred_values = perform_quadratic_regression(x_values, y_values)\n",
+    "            axs[1].plot(x_values, y_pred_values, color='skyblue')\n",
+    "\n",
+    "            # Calculate MSE loss for values\n",
+    "            mse_loss_values = np.mean((np.array(x_values) - np.array(y_values)) ** 2)\n",
+    "            mse_losses_values.append(mse_loss_values)\n",
+    "\n",
+    "            plt.tight_layout()\n",
+    "            plt.show()\n",
+    "\n",
+    "    return f1_scores_wager, mse_losses_indices , mse_losses_values, discrimination_performances, results_for_plotting\n",
+    "\n",
+    "def generate_patterns(patterns_number, num_units, factor, condition = 0):\n",
+    "    \"\"\"\n",
+    "    Generates patterns and targets for training the networks\n",
+    "\n",
+    "    # patterns_number: Number of patterns to generate\n",
+    "    # num_units: Number of units in each pattern\n",
+    "    # pattern: 0: superthreshold, 1: subthreshold, 2: low vision\n",
+    "    # Returns lists of patterns, stimulus present/absent indicators, and second order targets\n",
+    "    \"\"\"\n",
+    "\n",
+    "    patterns_number= patterns_number*factor\n",
+    "\n",
+    "    patterns = []  # Store generated patterns\n",
+    "    stim_present = []  # Indicators for when a stimulus is present in the pattern\n",
+    "    stim_absent = []  # Indicators for when no stimulus is present\n",
+    "    order_2_pr = []  # Second order network targets based on the presence or absence of stimulus\n",
+    "\n",
+    "    if condition == 0:\n",
+    "        random_limit= 0.0\n",
+    "        baseline = 0\n",
+    "        multiplier = 1\n",
+    "\n",
+    "    if condition == 1:\n",
+    "        random_limit= 0.02\n",
+    "        baseline = 0.0012\n",
+    "        multiplier = 1\n",
+    "\n",
+    "    if condition == 2:\n",
+    "        random_limit= 0.02\n",
+    "        baseline = 0.0012\n",
+    "        multiplier = 0.3\n",
+    "\n",
+    "    # Generate patterns, half noise and half potential stimuli\n",
+    "    for i in range(patterns_number):\n",
+    "\n",
+    "        # First half: Noise patterns\n",
+    "        if i < patterns_number // 2:\n",
+    "\n",
+    "            pattern = multiplier * np.random.uniform(0.0, random_limit, num_units) + baseline # Generate a noise pattern\n",
+    "            patterns.append(pattern)\n",
+    "            stim_present.append(np.zeros(num_units))  # Stimulus absent\n",
+    "            order_2_pr.append([0.0 , 1.0])  # No stimulus, low wager\n",
+    "\n",
+    "        # Second half: Stimulus patterns\n",
+    "        else:\n",
+    "            stimulus_number = random.randint(0, num_units - 1) # Choose a unit for potential stimulus\n",
+    "            pattern = np.random.uniform(0.0, random_limit, num_units) + baseline\n",
+    "            pattern[stimulus_number] = np.random.uniform(0.0, 1.0) * multiplier   # Set stimulus intensity\n",
+    "\n",
+    "            patterns.append(pattern)\n",
+    "            present = np.zeros(num_units)\n",
+    "            # Determine if stimulus is above discrimination threshold\n",
+    "            if pattern[stimulus_number] >= multiplier/2:\n",
+    "                order_2_pr.append([1.0 , 0.0])  # Stimulus detected, high wager\n",
+    "                present[stimulus_number] = 1.0\n",
+    "            else:\n",
+    "                order_2_pr.append([0.0 , 1.0])  # Stimulus not detected, low wager\n",
+    "                present[stimulus_number] = 0.0\n",
+    "\n",
+    "            stim_present.append(present)\n",
+    "\n",
+    "\n",
+    "    patterns_tensor = torch.Tensor(patterns).to(device).requires_grad_(True)\n",
+    "    stim_present_tensor = torch.Tensor(stim_present).to(device).requires_grad_(True)\n",
+    "    stim_absent_tensor= torch.Tensor(stim_absent).to(device).requires_grad_(True)\n",
+    "    order_2_tensor = torch.Tensor(order_2_pr).to(device).requires_grad_(True)\n",
+    "\n",
+    "    return patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor\n",
+    "\n",
+    "def create_patterns(stimulus,factor):\n",
+    "    \"\"\"\n",
+    "    Generates neural network input patterns based on specified stimulus conditions.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - stimulus (int): Determines the type of patterns to generate.\n",
+    "                      Acceptable values:\n",
+    "                      - 0: Suprathreshold stimulus\n",
+    "                      - 1: Subthreshold stimulus\n",
+    "                      - 2: Low vision condition\n",
+    "\n",
+    "    Returns:\n",
+    "    - torch.Tensor: Tensor of generated patterns.\n",
+    "    - torch.Tensor: Tensor of target values corresponding to the generated patterns.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Generate initial patterns and target tensors for base condition.\n",
+    "\n",
+    "    patterns_tensor, stim_present_tensor, _, _ = generate_patterns(patterns_number, num_units ,factor, stimulus)\n",
+    "    # Convert pattern tensors for processing on specified device (CPU/GPU).\n",
+    "    patterns = torch.Tensor(patterns_tensor).to(device)\n",
+    "    targets = torch.Tensor(stim_present_tensor).to(device)\n",
+    "\n",
+    "    return patterns, targets\n",
+    "\n",
+    "def pre_train(first_order_network, second_order_network, criterion_1,  criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2, factor, meta):\n",
+    "    \"\"\"\n",
+    "    Conducts pre-training for first-order and second-order networks.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - first_order_network (torch.nn.Module): Network for basic input-output mapping.\n",
+    "    - second_order_network (torch.nn.Module): Network for decision-making based on the first network's output.\n",
+    "    - criterion_1, criterion_2 (torch.nn): Loss functions for the respective networks.\n",
+    "    - optimizer_1, optimizer_2 (torch.optim): Optimizers for the respective networks.\n",
+    "    - scheduler_1, scheduler_2 (torch.optim.lr_scheduler): Schedulers for learning rate adjustment.\n",
+    "    - factor (float): Parameter influencing data augmentation or pattern generation.\n",
+    "    - meta (bool): Flag indicating the use of meta-learning strategies.\n",
+    "\n",
+    "    Returns:\n",
+    "    Tuple containing updated networks and epoch-wise loss records.\n",
+    "\n",
+    "    \"\"\"\n",
+    "    def get_num_args(func):\n",
+    "      return func.__code__.co_argcount\n",
+    "\n",
+    "    max_values_output_first_order = []\n",
+    "    max_indices_output_first_order = []\n",
+    "    max_values_patterns_tensor = []\n",
+    "    max_indices_patterns_tensor = []\n",
+    "\n",
+    "    epoch_1_order = np.zeros(n_epochs)\n",
+    "    epoch_2_order = np.zeros(n_epochs)\n",
+    "\n",
+    "    for epoch in range(n_epochs):\n",
+    "        # Generate training patterns and targets for each epoch\n",
+    "        patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor = generate_patterns(patterns_number, num_units,factor, 0)\n",
+    "\n",
+    "        # Forward pass through the first-order network\n",
+    "        hidden_representation , output_first_order = first_order_network(patterns_tensor)\n",
+    "\n",
+    "        patterns_tensor=patterns_tensor.requires_grad_(True)\n",
+    "        output_first_order=output_first_order.requires_grad_(True)\n",
+    "\n",
+    "        # Get max values and indices for output_first_order\n",
+    "        max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
+    "        max_inds_out[max_vals_out == 0] = 0\n",
+    "        max_values_output_first_order.append(max_vals_out.tolist())\n",
+    "        max_indices_output_first_order.append(max_inds_out.tolist())\n",
+    "\n",
+    "        # Get max values and indices for patterns_tensor\n",
+    "        max_vals_pat, max_inds_pat = torch.max(patterns_tensor[100:], dim=1)\n",
+    "        max_inds_pat[max_vals_pat == 0] = 0\n",
+    "        max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
+    "        max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
+    "\n",
+    "        optimizer_1.zero_grad()\n",
+    "\n",
+    "        # Conditionally execute the second-order network pass and related operations\n",
+    "        if meta:\n",
+    "\n",
+    "            # Forward pass through the second-order network with inputs from the first-order network\n",
+    "            output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
+    "\n",
+    "            # Calculate the loss for the second-order network (wagering decision based on comparison)\n",
+    "            loss_2 = criterion_2(output_second_order.squeeze(), order_2_tensor[:, 0])\n",
+    "\n",
+    "            optimizer_2.zero_grad()\n",
+    "\n",
+    "\n",
+    "            # Backpropagate the second-order network's loss\n",
+    "            loss_2.backward(retain_graph=True)  # Allows further backpropagation for loss_1 after loss_2\n",
+    "\n",
+    "            # Update second-order network weights\n",
+    "            optimizer_2.step()\n",
+    "\n",
+    "            scheduler_2.step()\n",
+    "\n",
+    "            epoch_2_order[epoch] = loss_2.item()\n",
+    "        else:\n",
+    "            # Skip computations for the second-order network\n",
+    "            with torch.no_grad():\n",
+    "                # Potentially forward pass through the second-order network without tracking gradients\n",
+    "                output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
+    "\n",
+    "        # Calculate the loss for the first-order network (accuracy of stimulus representation)\n",
+    "\n",
+    "        num_args = get_num_args(criterion_1)\n",
+    "\n",
+    "        if num_args == 2:\n",
+    "          loss_1 = criterion_1(  output_first_order , stim_present_tensor )\n",
+    "        else:\n",
+    "          W = first_order_network.state_dict()['fc1.weight']\n",
+    "          loss_1 = criterion_1( W, stim_present_tensor.view(-1, 100), output_first_order,\n",
+    "                             hidden_representation, lam )\n",
+    "\n",
+    "        # Backpropagate the first-order network's loss\n",
+    "        loss_1.backward()\n",
+    "\n",
+    "        # Update first-order network weights\n",
+    "        optimizer_1.step()\n",
+    "\n",
+    "        # Reset first-order optimizer gradients to zero for the next iteration\n",
+    "\n",
+    "        # Update the first-order scheduler\n",
+    "        scheduler_1.step()\n",
+    "\n",
+    "        epoch_1_order[epoch] = loss_1.item()\n",
+    "        #epoch_1_order[epoch] = loss_location.item()\n",
+    "\n",
+    "    return first_order_network, second_order_network, epoch_1_order, epoch_2_order , (max_values_output_first_order[-1],\n",
+    "            max_indices_output_first_order[-1],\n",
+    "            max_values_patterns_tensor[-1],\n",
+    "            max_indices_patterns_tensor[-1])\n",
+    "\n",
+    "def HOSS_evaluate(X, mu, Sigma, Aprior, Wprior):\n",
+    "    \"\"\"\n",
+    "    Inference on 2D Bayes net for asymmetric inference on presence vs. absence.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Initialise variables and conditional prob tables\n",
+    "    p_A = np.array([1 - Aprior, Aprior])  # prior on awareness state A\n",
+    "    p_W_a1 = np.append(0, Wprior)  # likelihood of world states W given aware, first entry is absence\n",
+    "    p_W_a0 = np.append(1, np.zeros(len(Wprior)))  # likelihood of world states W given unaware, first entry is absence\n",
+    "    p_W = (p_W_a1 + p_W_a0) / 2  # prior on W marginalising over A (for KL)\n",
+    "\n",
+    "    # Compute likelihood of observed X for each possible W (P(X|mu_w, Sigma))\n",
+    "    lik_X_W = np.array([multivariate_normal.pdf(X, mean=mu_i, cov=Sigma) for mu_i in mu])\n",
+    "    p_X_W = lik_X_W / lik_X_W.sum()  # normalise to get P(X|W)\n",
+    "\n",
+    "    # Combine with likelihood of each world state w given awareness state A\n",
+    "    lik_W_A = np.vstack((p_X_W * p_W_a0 * p_A[0], p_X_W * p_W_a1 * p_A[1]))\n",
+    "    post_A = lik_W_A.sum(axis=1)  # sum over W\n",
+    "    post_A = post_A / post_A.sum()  # normalise\n",
+    "\n",
+    "    # Posterior over W (P(W|X=x) marginalising over A)\n",
+    "    post_W = lik_W_A.sum(axis=0)  # sum over A\n",
+    "    post_W = post_W / post_W.sum()  # normalise\n",
+    "\n",
+    "    # KL divergences\n",
+    "    KL_W = (post_W * np.log(post_W / p_W)).sum()\n",
+    "    KL_A = (post_A * np.log(post_A / p_A)).sum()\n",
+    "\n",
+    "    return post_W, post_A, KL_W, KL_A"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4e5761b4",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Plotting functions\n",
+    "# @markdown\n",
+    "\n",
+    "def plot_testing(results_seed, discrimination_seed, seeds, title):\n",
+    "    print(results_seed)\n",
+    "    print(discrimination_seed)\n",
+    "\n",
+    "    Testing_graph_names = [\"Suprathreshold stimulus\", \"Subthreshold stimulus\", \"Low Vision\"]\n",
+    "\n",
+    "    fig, ax = plt.subplots(figsize=(14, len(results_seed[0]) * 2 + 2))  # Adjusted for added header space\n",
+    "    ax.axis('off')\n",
+    "    ax.axis('tight')\n",
+    "\n",
+    "    # Define column labels\n",
+    "    col_labels = [\"Scenario\", \"F1 SCORE\\n(2nd order network)\", \"RECALL\\n(2nd order network)\", \"PRECISION\\n(2nd order network)\", \"Discrimination Performance\\n(1st order network)\", \"ACCURACY\\n(2nd order network)\"]\n",
+    "\n",
+    "    # Initialize list to hold all rows of data including headers\n",
+    "    full_data = []\n",
+    "\n",
+    "    # Calculate averages and standard deviations\n",
+    "    for i in range(len(results_seed[0])):\n",
+    "        metrics_list = [result[i][\"metrics\"][0] for result in results_seed]  # Collect metrics for each seed\n",
+    "        discrimination_list = [discrimination_seed[j][i] for j in range(seeds)]\n",
+    "\n",
+    "        # Calculate averages and standard deviations for metrics\n",
+    "        avg_metrics = np.mean(metrics_list, axis=0).tolist()\n",
+    "        std_metrics = np.std(metrics_list, axis=0).tolist()\n",
+    "\n",
+    "        # Calculate average and standard deviation for discrimination performance\n",
+    "        avg_discrimination = np.mean(discrimination_list)\n",
+    "        std_discrimination = np.std(discrimination_list)\n",
+    "\n",
+    "        # Format the row with averages and standard deviations\n",
+    "        row = [\n",
+    "            Testing_graph_names[i],\n",
+    "            f\"{avg_metrics[2]:.2f} ± {std_metrics[2]:.2f}\",  # F1 SCORE\n",
+    "            f\"{avg_metrics[1]:.2f} ± {std_metrics[1]:.2f}\",  # RECALL\n",
+    "            f\"{avg_metrics[0]:.2f} ± {std_metrics[0]:.2f}\",  # PRECISION\n",
+    "            f\"{avg_discrimination:.2f} ± {std_discrimination:.2f}\",  # Discrimination Performance\n",
+    "            f\"{avg_metrics[3]:.2f} ± {std_metrics[3]:.2f}\"  # ACCURACY\n",
+    "        ]\n",
+    "        full_data.append(row)\n",
+    "\n",
+    "    # Extract metric values for color scaling (excluding the first and last columns which are text)\n",
+    "    metric_values = np.array([[float(x.split(\" ± \")[0]) for x in row[1:]] for row in full_data])  # Convert to float for color scaling\n",
+    "    max_value = np.max(metric_values)\n",
+    "    colors = metric_values / max_value  # Normalize for color mapping\n",
+    "\n",
+    "    # Prepare colors for all cells, defaulting to white for non-metric cells\n",
+    "    cell_colors = [[\"white\"] * len(col_labels) for _ in range(len(full_data))]\n",
+    "    for i, row in enumerate(colors):\n",
+    "        cell_colors[i][1] = plt.cm.RdYlGn(row[0])\n",
+    "        cell_colors[i][2] = plt.cm.RdYlGn(row[1])\n",
+    "        cell_colors[i][3] = plt.cm.RdYlGn(row[2])\n",
+    "        cell_colors[i][5] = plt.cm.RdYlGn(row[3])  # Adding color for accuracy\n",
+    "\n",
+    "    # Adding color for discrimination performance\n",
+    "    discrimination_colors = colors[:, 3]\n",
+    "    for i, dp_color in enumerate(discrimination_colors):\n",
+    "        cell_colors[i][4] = plt.cm.RdYlGn(dp_color)\n",
+    "\n",
+    "    # Create the main table with cell colors\n",
+    "    table = ax.table(cellText=full_data, colLabels=col_labels, loc='center', cellLoc='center', cellColours=cell_colors)\n",
+    "    table.auto_set_font_size(False)\n",
+    "    table.set_fontsize(10)\n",
+    "    table.scale(1.5, 1.5)\n",
+    "\n",
+    "    # Set the height of the header row to be double that of the other rows\n",
+    "    for j, col_label in enumerate(col_labels):\n",
+    "        cell = table[(0, j)]\n",
+    "        cell.set_height(cell.get_height() * 2)\n",
+    "\n",
+    "    # Add chance level table\n",
+    "    chance_level_data = [[\"Chance Level\\nDiscrimination(1st)\", \"Chance Level\\nAccuracy(2nd)\"],\n",
+    "                         [\"0.010\", \"0.50\"]]\n",
+    "\n",
+    "    chance_table = ax.table(cellText=chance_level_data, bbox=[1.0, 0.8, 0.3, 0.1], cellLoc='center', colWidths=[0.1, 0.1])\n",
+    "    chance_table.auto_set_font_size(False)\n",
+    "    chance_table.set_fontsize(10)\n",
+    "    chance_table.scale(1.2, 1.2)\n",
+    "\n",
+    "    # Set the height of the header row to be double that of the other rows in the chance level table\n",
+    "    for j in range(len(chance_level_data[0])):\n",
+    "        cell = chance_table[(0, j)]\n",
+    "        cell.set_height(cell.get_height() * 2)\n",
+    "\n",
+    "    plt.title(title, pad=20, fontsize=16)\n",
+    "    plt.show()\n",
+    "    plt.close(fig)\n",
+    "\n",
+    "\n",
+    "def plot_signal_max_and_indicator(patterns_tensor, plot_title=\"Training Signals\"):\n",
+    "    \"\"\"\n",
+    "    Plots the maximum values of signal units and a binary indicator for max values greater than 0.5.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - patterns_tensor: A tensor containing signals, where each signal is expected to have multiple units.\n",
+    "    \"\"\"\n",
+    "    with plt.xkcd():\n",
+    "\n",
+    "        # Calculate the maximum value of units for each signal within the patterns tensor\n",
+    "        max_values_of_units = patterns_tensor.max(dim=1).values.cpu().numpy()  # Ensure it's on CPU and in NumPy format for plotting\n",
+    "\n",
+    "        # Determine the binary indicators based on the max value being greater than 0.5\n",
+    "        binary_indicators = (max_values_of_units > 0.5).astype(int)\n",
+    "\n",
+    "        # Create a figure with 2 subplots (2 rows, 1 column)\n",
+    "        fig, axs = plt.subplots(2, 1, figsize=(8, 8))\n",
+    "\n",
+    "        fig.suptitle(plot_title, fontsize=16)  # Set the overall title for the plot\n",
+    "\n",
+    "        # First subplot for the maximum values of each signal\n",
+    "        axs[0].plot(range(patterns_tensor.size(0)), max_values_of_units, drawstyle='steps-mid')\n",
+    "        axs[0].set_xlabel('Pattern Number')\n",
+    "        axs[0].set_ylabel('Max Value of Signal Units')\n",
+    "        axs[0].set_ylim(-0.1, 1.1)  # Adjust y-axis limits for clarity\n",
+    "        axs[0].grid(True)\n",
+    "\n",
+    "        # Second subplot for the binary indicators\n",
+    "        axs[1].plot(range(patterns_tensor.size(0)), binary_indicators, drawstyle='steps-mid', color='red')\n",
+    "        axs[1].set_xlabel('Pattern Number')\n",
+    "        axs[1].set_ylabel('Indicator (Max > 0.5) in each signal')\n",
+    "        axs[1].set_ylim(-0.1, 1.1)  # Adjust y-axis limits for clarity\n",
+    "        axs[1].grid(True)\n",
+    "\n",
+    "        plt.tight_layout()\n",
+    "        plt.show()\n",
+    "\n",
+    "\n",
+    "def perform_quadratic_regression(epoch_list, values):\n",
+    "    # Perform quadratic regression\n",
+    "    coeffs = np.polyfit(epoch_list, values, 2)  # Coefficients of the polynomial\n",
+    "    y_pred = np.polyval(coeffs, epoch_list)        # Evaluate the polynomial at the given x values\n",
+    "    return y_pred\n",
+    "\n",
+    "\n",
+    "def pre_train_plots(epoch_1_order, epoch_2_order, title, max_values_indices):\n",
+    "    \"\"\"\n",
+    "    Plots the training progress with regression lines and scatter plots of indices and values of max elements.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - epoch_list (list): List of epoch numbers.\n",
+    "    - epoch_1_order (list): Loss values for the first-order network over epochs.\n",
+    "    - epoch_2_order (list): Loss values for the second-order network over epochs.\n",
+    "    - title (str): Title for the plots.\n",
+    "    - max_values_indices (tuple): Tuple containing lists of max values and indices for both tensors.\n",
+    "    \"\"\"\n",
+    "    (max_values_output_first_order,\n",
+    "     max_indices_output_first_order,\n",
+    "     max_values_patterns_tensor,\n",
+    "     max_indices_patterns_tensor) = max_values_indices\n",
+    "\n",
+    "    # Perform quadratic regression for the loss plots\n",
+    "    epoch_list = list(range(len(epoch_1_order)))\n",
+    "    y_pred1 = perform_quadratic_regression(epoch_list, epoch_1_order)\n",
+    "    y_pred2 = perform_quadratic_regression(epoch_list, epoch_2_order)\n",
+    "\n",
+    "    # Set up the plot with 2 rows and 2 columns\n",
+    "    fig, axs = plt.subplots(2, 2, figsize=(15, 10))\n",
+    "\n",
+    "    # First graph for 1st Order Network\n",
+    "    axs[0, 0].plot(epoch_list, epoch_1_order, linestyle='--', marker='o', color='g')\n",
+    "    axs[0, 0].plot(epoch_list, y_pred1, linestyle='-', color='r', label='Quadratic Fit')\n",
+    "    axs[0, 0].legend(['1st Order Network', 'Quadratic Fit'])\n",
+    "    axs[0, 0].set_title('1st Order Network Loss')\n",
+    "    axs[0, 0].set_xlabel('Epochs - Pretraining Phase')\n",
+    "    axs[0, 0].set_ylabel('Loss')\n",
+    "\n",
+    "    # Second graph for 2nd Order Network\n",
+    "    axs[0, 1].plot(epoch_list, epoch_2_order, linestyle='--', marker='o', color='b')\n",
+    "    axs[0, 1].plot(epoch_list, y_pred2, linestyle='-', color='r', label='Quadratic Fit')\n",
+    "    axs[0, 1].legend(['2nd Order Network', 'Quadratic Fit'])\n",
+    "    axs[0, 1].set_title('2nd Order Network Loss')\n",
+    "    axs[0, 1].set_xlabel('Epochs - Pretraining Phase')\n",
+    "    axs[0, 1].set_ylabel('Loss')\n",
+    "\n",
+    "    # Scatter plot of indices: patterns_tensor vs. output_first_order\n",
+    "    axs[1, 0].scatter(max_indices_patterns_tensor, max_indices_output_first_order, alpha=0.5)\n",
+    "\n",
+    "    # Add quadratic regression line\n",
+    "    indices_regression = perform_quadratic_regression(max_indices_patterns_tensor, max_indices_output_first_order)\n",
+    "    axs[1, 0].plot(max_indices_patterns_tensor, indices_regression, color='skyblue', linestyle='--', label='Quadratic Fit')\n",
+    "\n",
+    "    axs[1, 0].set_title('Stimuli location: First Order Input vs. First Order Output')\n",
+    "    axs[1, 0].set_xlabel('First Order Input Indices')\n",
+    "    axs[1, 0].set_ylabel('First Order Output Indices')\n",
+    "    axs[1, 0].legend()\n",
+    "\n",
+    "    # Scatter plot of values: patterns_tensor vs. output_first_order\n",
+    "    axs[1, 1].scatter(max_values_patterns_tensor, max_values_output_first_order, alpha=0.5)\n",
+    "\n",
+    "    # Add quadratic regression line\n",
+    "    values_regression = perform_quadratic_regression(max_values_patterns_tensor, max_values_output_first_order)\n",
+    "    axs[1, 1].plot(max_values_patterns_tensor, values_regression, color='skyblue', linestyle='--', label='Quadratic Fit')\n",
+    "\n",
+    "    axs[1, 1].set_title('Stimuli Values: First Order Input vs. First Order Output')\n",
+    "    axs[1, 1].set_xlabel('First Order Input Values')\n",
+    "    axs[1, 1].set_ylabel('First Order Output Values')\n",
+    "    axs[1, 1].legend()\n",
+    "\n",
+    "    plt.suptitle(title, fontsize=16, y=1.02)\n",
+    "\n",
+    "    # Display the plots in a 2x2 grid\n",
+    "    plt.tight_layout()\n",
+    "    plt.savefig('Blindsight_Pre_training_Loss_{}.png'.format(title.replace(\" \", \"_\").replace(\"/\", \"_\")), bbox_inches='tight')\n",
+    "    plt.show()\n",
+    "    plt.close(fig)\n",
+    "\n",
+    "# Function to configure the training environment and load the models\n",
+    "def config_training(first_order_network, second_order_network, hidden, factor, gelu):\n",
+    "    \"\"\"\n",
+    "    Configures the training environment by saving the state of the given models and loading them back.\n",
+    "    Initializes testing patterns for evaluation.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - first_order_network: The first order network instance.\n",
+    "    - second_order_network: The second order network instance.\n",
+    "    - hidden: Number of hidden units in the first order network.\n",
+    "    - factor: Factor influencing the network's architecture.\n",
+    "    - gelu: Activation function to be used in the network.\n",
+    "\n",
+    "    Returns:\n",
+    "    - Tuple of testing patterns, number of samples in the testing patterns, and the loaded model instances.\n",
+    "    \"\"\"\n",
+    "    # Paths where the models' states will be saved\n",
+    "    PATH = './cnn1.pth'\n",
+    "    PATH_2 = './cnn2.pth'\n",
+    "\n",
+    "    # Save the weights of the pretrained networks to the specified paths\n",
+    "    torch.save(first_order_network.state_dict(), PATH)\n",
+    "    torch.save(second_order_network.state_dict(), PATH_2)\n",
+    "\n",
+    "    # Generating testing patterns for three different sets\n",
+    "    First_set, First_set_targets = create_patterns(0,factor)\n",
+    "    Second_set, Second_set_targets = create_patterns(1,factor)\n",
+    "    Third_set, Third_set_targets = create_patterns(2,factor)\n",
+    "\n",
+    "    # Aggregate testing patterns and their targets for ease of access\n",
+    "    Testing_patterns = [[First_set, First_set_targets], [Second_set, Second_set_targets], [Third_set, Third_set_targets]]\n",
+    "\n",
+    "    # Determine the number of samples from the first set (assumed consistent across all sets)\n",
+    "    n_samples = len(Testing_patterns[0][0])\n",
+    "\n",
+    "    # Initialize and load the saved states into model instances\n",
+    "    loaded_model = FirstOrderNetwork(hidden, factor, gelu)\n",
+    "    loaded_model_2 = SecondOrderNetwork(gelu)\n",
+    "\n",
+    "    loaded_model.load_state_dict(torch.load(PATH))\n",
+    "    loaded_model_2.load_state_dict(torch.load(PATH_2))\n",
+    "\n",
+    "    # Ensure the models are moved to the appropriate device (CPU/GPU) and set to evaluation mode\n",
+    "    loaded_model.to(device)\n",
+    "    loaded_model_2.to(device)\n",
+    "\n",
+    "    loaded_model.eval()\n",
+    "    loaded_model_2.eval()\n",
+    "\n",
+    "    return Testing_patterns, n_samples, loaded_model, loaded_model_2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de910a5b",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "---\n",
+    "\n",
+    "# Introduction\n",
+    "\n",
+    "This bonus tutorial extends a lot of the content that was covered in Tutorial 1 based around the theme of Consciousness. At the end of Section 2. We discussed and implemented a lot of ideas around first-order models and we briefly mentioned second-order models. In this tutorial, we're going to actually develop some ideas and model the effects of blindsight, the phenomenon we introduced earlier on today, where patients have no conscious experience of sight but are able to navigate around objects (showing that their brains are processing sensory information, but it doesn't reach the level of subjective experience). We first introduce the coding of the first-order model, followed by the second-order model. Then we show you some ways to plot the results from these models.\n",
+    "\n",
+    "After this we end on some further high-level thoughts on the theme of consciousness. \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76dd7488-6558-4022-8541-22765f2967c6",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "## Section 1: Train a First-Order Network\n",
+    "\n",
+    "This section invites you to engage with a straightforward, auto-generated dataset on blindsight, originally introduced by [Pasquali et al. in 2010](https://www.sciencedirect.com/science/article/abs/pii/S0010027710001794). Blindsight is a fascinating condition where individuals who are cortically blind due to damage in their primary visual cortex can still respond to visual stimuli without conscious perception. This intriguing phenomenon underscores the intricate nature of sensory processing and the brain's ability to process information without conscious awareness."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8c79b0a2-8e12-44ea-a685-bba788f6685d",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Visualize the autogenerated data\n",
+    "factor=2\n",
+    "initialize_global()\n",
+    "set_pre, _ = create_patterns(0,factor)\n",
+    "plot_signal_max_and_indicator(set_pre.detach().cpu(), \"Example - Pre training dataset\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cff70408-8662-43f5-b930-fc2a6ffca323",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "The pre-training dataset for the network consisted of 200 patterns. These were evenly divided: half were purely noise (with unit activations randomly chosen between 0.0 and 0.02), and the other half represented potential stimuli. In the stimulus patterns, 99 out of 100 units had activations ranging between 0.0 and 0.02, with one unique unit having an activation between 0.0 and 1.0."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f45662a-08b4-44a4-89e1-200fc0c9cddb",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "**Testing patterns**\n",
+    "\n",
+    "As we have seen before, the network underwent evaluations under three distinct conditions, each modifying the signal-to-noise ratio in a unique way to explore different degrees and types of blindness.\n",
+    "\n",
+    "Suprathreshold stimulus condition: here, the network was exposed to the identical set of 200 patterns used during pre-training, testing the network's response to familiar inputs.\n",
+    "\n",
+    "Subthreshold stimulus condition (blindsight simulation): this condition aimed to mimic blindsight. It was achieved by introducing a slight noise increment (+0.0012) to every input of the first-order network, barring the one designated as the stimulus. This setup tested the network's ability to discern faint signals amidst noise.\n",
+    "\n",
+    "Low vision condition: to simulate low vision, the activation levels of the stimuli were reduced. Unlike the range from 0.0 to 1.0 used in pre-training, the stimuli's activation levels were adjusted to span from 0.0 to 0.3. This condition examined the network's capability to recognize stimuli with diminished intensity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "db58d78b-17d8-4651-801a-f06e568a7322",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "factor=2\n",
+    "# Compare your results with the patterns generate below\n",
+    "set_1, _ = create_patterns(0,factor)\n",
+    "set_2, _ = create_patterns(1,factor)\n",
+    "set_3, _ = create_patterns(2,factor)\n",
+    "\n",
+    "# Plot\n",
+    "plot_signal_max_and_indicator(set_1.detach().cpu(), \"Suprathreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_2.detach().cpu(), \"Subthreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_3.detach().cpu(), \"Low Vision dataset\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd5c13e0-75e8-45c2-b1be-70496041364b",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Coding Exercise 1: Building a network for a blindsight situation\n",
+    "\n",
+    "In this activity, we'll construct a neural network model using our auto-generated dataset, focusing on blindsight scenarios. The model will primarily consist of fully connected layers, establishing a straightforward, first-order network. The aim here is to assess the basic network's performance.\n",
+    "\n",
+    "**Steps to follow**\n",
+    "\n",
+    "1. Examine the network architecture: understand the structure of the neural network you're about to work with.\n",
+    "2. Visualize loss metrics: observe and analyze the network's performance during pre-training by visualizing the loss over epochs.\n",
+    "3. Evaluate the model: use the provided code snippets to calculate and interpret the model's accuracy, recall, and F1-score, giving you insight into the network's capabilities.\n",
+    "\n",
+    "**Understanding the process**\n",
+    "\n",
+    "The goal is to gain a thorough comprehension of the network's architecture and to interpret the pre-training results visually. This will provide a clearer picture of the model's potential and limitations.\n",
+    "\n",
+    "The network is designed as a backpropagation autoassociator. It features a 100-unit input layer, directly linked to a 40-unit hidden layer, which in turn connects to a 100-unit output layer. Initial connection weights are set within the range of -1.0 to 1.0 for the first-order network. To mitigate overfitting, dropout is employed within the network architecture. The architecture includes a configurable activation function. This flexibility allows for adjustments and tuning in Activity 3, aiming for optimal model performance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "94d0bcaf-8b49-4e35-b0d2-1b9dcc98b182",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "class FirstOrderNetwork(nn.Module):\n",
+    "    def __init__(self, hidden_units, data_factor, use_gelu):\n",
+    "        \"\"\"\n",
+    "        Initializes the FirstOrderNetwork with specific configurations.\n",
+    "\n",
+    "        Parameters:\n",
+    "        - hidden_units (int): The number of units in the hidden layer.\n",
+    "        - data_factor (int): Factor to scale the amount of data processed.\n",
+    "                             A factor of 1 indicates the default data amount,\n",
+    "                             while 10 indicates 10 times the default amount.\n",
+    "        - use_gelu (bool): Flag to use GELU (True) or ReLU (False) as the activation function.\n",
+    "        \"\"\"\n",
+    "        super(FirstOrderNetwork, self).__init__()\n",
+    "\n",
+    "        # Define the encoder, hidden, and decoder layers with specified units\n",
+    "\n",
+    "        self.fc1 = nn.Linear(100, hidden_units, bias = False) # Encoder\n",
+    "        self.hidden= nn.Linear(hidden_units, hidden_units, bias = False) # Hidden\n",
+    "        self.fc2 = nn.Linear(hidden_units, 100, bias = False) # Decoder\n",
+    "\n",
+    "        self.relu = nn.ReLU()\n",
+    "        self.sigmoid = nn.Sigmoid()\n",
+    "\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting\n",
+    "        self.dropout = nn.Dropout(0.1)\n",
+    "\n",
+    "        # Set the data factor\n",
+    "        self.data_factor = data_factor\n",
+    "\n",
+    "        # Other activation functions for various purposes\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize network weights\n",
+    "        self.initialize_weights()\n",
+    "\n",
+    "    def initialize_weights(self):\n",
+    "        \"\"\"Initializes weights of the encoder, hidden, and decoder layers uniformly.\"\"\"\n",
+    "        init.uniform_(self.fc1.weight, -1.0, 1.0)\n",
+    "\n",
+    "        init.uniform_(self.fc2.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.hidden.weight, -1.0, 1.0)\n",
+    "\n",
+    "    def encoder(self, x):\n",
+    "      h1 = self.dropout(self.relu(self.fc1(x.view(-1, 100))))\n",
+    "      return h1\n",
+    "\n",
+    "    def decoder(self,z):\n",
+    "      #h2 = self.relu(self.hidden(z))\n",
+    "      h2 = self.sigmoid(self.fc2(z))\n",
+    "      return h2\n",
+    "\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "      \"\"\"\n",
+    "      Defines the forward pass through the network.\n",
+    "\n",
+    "      Parameters:\n",
+    "      - x (Tensor): The input tensor to the network.\n",
+    "\n",
+    "      Returns:\n",
+    "      - Tensor: The output of the network after passing through the layers and activations.\n",
+    "      \"\"\"\n",
+    "      h1 = self.encoder(x)\n",
+    "      h2 = self.decoder(h1)\n",
+    "\n",
+    "      return h1 , h2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83e07f1a-540b-4bfa-8e9b-f114319f1f96",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "For now, we will train the first-order network only."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4202ab0d",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "seeds=15\n",
+    "\n",
+    "results_seed=[]\n",
+    "discrimination_seed=[]\n",
+    "\n",
+    "# Hyperparameters\n",
+    "optimizer=\"ADAMAX\"\n",
+    "hidden=40\n",
+    "factor=2\n",
+    "gelu=False\n",
+    "gam=0.98\n",
+    "meta=True\n",
+    "stepsize=25\n",
+    "\n",
+    "for i in range(seeds):\n",
+    "  print(f\"Seed {i}\")\n",
+    "\n",
+    "  # Compare your results with the patterns generate below\n",
+    "  initialize_global()\n",
+    "\n",
+    "  # Prepare networks, loss functions, optimizers, and schedulers for pre-training\n",
+    "  first_order_network, second_order_network, criterion_1, criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2 = prepare_pre_training(hidden, factor, gelu, stepsize, gam)\n",
+    "\n",
+    "  # Conduct pre-training for both the first-order and second-order networks\n",
+    "  first_order_network_pre, second_order_network_pre, epoch_1_order, epoch_2_order , max_value_indices = pre_train(first_order_network, second_order_network, criterion_1,  criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2, factor, meta)\n",
+    "\n",
+    "  # Plot the training progress of both networks to visualize performance and learning trends\n",
+    "  pre_train_plots(epoch_1_order, epoch_2_order, f\"1st & 2nd Order Networks - Seed {i}\" , max_value_indices )\n",
+    "\n",
+    "  # Configuration step for the main training phase or evaluation\n",
+    "  testing_patterns, n_samples = get_test_patterns(factor)\n",
+    "\n",
+    "  # Function to test the model using the configured testing patterns\n",
+    "  first_order_network_pre.eval()\n",
+    "  second_order_network_pre.eval()\n",
+    "  f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, first_order_network_pre, second_order_network_pre,factor)\n",
+    "  results_seed.append(results_for_plotting)\n",
+    "  discrimination_seed.append(discrimination_performances)\n",
+    "\n",
+    "plot_testing(results_seed, discrimination_seed, seeds, \"Test Results\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7bfade3d-6385-459c-8f07-e3017264455a",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "\n",
+    "# Hyperparameters\n",
+    "global optimizer ,n_epochs , learning_rate_1\n",
+    "learning_rate_1 = 0.5\n",
+    "n_epochs = 100\n",
+    "optimizer=\"ADAMAX\"\n",
+    "hidden=40\n",
+    "factor=2\n",
+    "gelu=False\n",
+    "gam=0.98\n",
+    "meta=True\n",
+    "stepsize=25\n",
+    "initialize_global()\n",
+    "\n",
+    "\n",
+    "# Networks instantiation\n",
+    "first_order_network = FirstOrderNetwork(hidden,factor,gelu).to(device)\n",
+    "second_order_network = SecondOrderNetwork(gelu).to(device) # We define it, but won't use it until activity 3\n",
+    "\n",
+    "# Loss function\n",
+    "criterion_1 = CAE_loss\n",
+    "\n",
+    "# Optimizer\n",
+    "optimizer_1 = optim.Adamax(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "\n",
+    "# Learning rate schedulers\n",
+    "scheduler_1 = StepLR(optimizer_1, step_size=stepsize, gamma=gam)\n",
+    "\n",
+    "max_values_output_first_order = []\n",
+    "max_indices_output_first_order = []\n",
+    "max_values_patterns_tensor = []\n",
+    "max_indices_patterns_tensor = []\n",
+    "\n",
+    "# Training loop\n",
+    "for epoch in range(n_epochs):\n",
+    "    # Generate training patterns and targets for each epoch.\n",
+    "    patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor = generate_patterns(patterns_number, num_units,factor, 0)\n",
+    "\n",
+    "    # Forward pass through the first-order network\n",
+    "    hidden_representation , output_first_order = first_order_network(patterns_tensor)\n",
+    "\n",
+    "    output_first_order=output_first_order.requires_grad_(True)\n",
+    "\n",
+    "    # Skip computations for the second-order network\n",
+    "    with torch.no_grad():\n",
+    "\n",
+    "        # Potentially forward pass through the second-order network without tracking gradients\n",
+    "        output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
+    "\n",
+    "    # Calculate the loss for the first-order network (accuracy of stimulus representation)\n",
+    "    W = first_order_network.state_dict()['fc1.weight']\n",
+    "    loss_1 = criterion_1( W, stim_present_tensor.view(-1, 100), output_first_order,\n",
+    "                        hidden_representation, lam )\n",
+    "    # Backpropagate the first-order network's loss\n",
+    "    loss_1.backward()\n",
+    "\n",
+    "    # Update first-order network weights\n",
+    "    optimizer_1.step()\n",
+    "\n",
+    "    # Reset first-order optimizer gradients to zero for the next iteration\n",
+    "\n",
+    "    # Update the first-order scheduler\n",
+    "    scheduler_1.step()\n",
+    "\n",
+    "    epoch_1_order[epoch] = loss_1.item()\n",
+    "\n",
+    "    # Get max values and indices for output_first_order\n",
+    "    max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
+    "    max_inds_out[max_vals_out == 0] = 0\n",
+    "    max_values_output_first_order.append(max_vals_out.tolist())\n",
+    "    max_indices_output_first_order.append(max_inds_out.tolist())\n",
+    "\n",
+    "    # Get max values and indices for patterns_tensor\n",
+    "    max_vals_pat, max_inds_pat = torch.max(patterns_tensor[100:], dim=1)\n",
+    "    max_inds_pat[max_vals_pat == 0] = 0\n",
+    "    max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
+    "    max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
+    "\n",
+    "\n",
+    "max_values_indices = (max_values_output_first_order[-1],\n",
+    "            max_indices_output_first_order[-1],\n",
+    "            max_values_patterns_tensor[-1],\n",
+    "            max_indices_patterns_tensor[-1])\n",
+    "\n",
+    "\n",
+    "# Plot training loss curve\n",
+    "pre_train_plots(epoch_1_order, epoch_2_order, \"1st & 2nd Order Networks\" , max_value_indices )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa7d1ce9-da1b-4f78-b388-47bfcd50c6dd",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Testing under 3 Blindsight Conditions\n",
+    "\n",
+    "We will now use the testing auto-generated datasets from activity 1 to test the network's performance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2affe162-f4d9-495f-862a-65b0f50ca5ef",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "results_seed=[]\n",
+    "discrimination_seed=[]\n",
+    "\n",
+    "# Prepare networks for testing by calling the configuration function\n",
+    "testing_patterns, n_samples, loaded_model, loaded_model_2 = config_training(first_order_network, second_order_network, hidden, factor, gelu)\n",
+    "\n",
+    "# Perform testing using the defined function and plot the results\n",
+    "f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, loaded_model, loaded_model_2,factor)\n",
+    "\n",
+    "results_seed.append(results_for_plotting)\n",
+    "discrimination_seed.append(discrimination_performances)\n",
+    "# Assuming plot_testing is defined, call it to display results\n",
+    "plot_testing(results_seed,discrimination_seed,  1, \"Seed\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0d18302e-4657-4732-b6ef-f7439d2bb2fd",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_First_order_network\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96579a08-3c95-4dfe-9908-fabe1bb146d0",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "## Section 2: Train a Second-Order network"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caac41bc-5a93-43bf-aede-7c1e87e83fbd",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "Having previously examined the first-order network, we now switch to the second-order network, described in more detail back in Tutorial 1 (please revisit the text and video content there if you need to recap the concepts or want to refresh your understanding of the difference between these models )\n",
+    "To study this, we use a simulated dataset that mimics the conditions of blindsight. This dataset contains 400 patterns, equally split between two types:\n",
+    "\n",
+    "- **Random noise patterns** consist of low activations ranging between 0.0 and 0.02.\n",
+    "- **Designed stimulus patterns** - each pattern includes one unit that shows a higher activation level, varying between 0.0 and 1.0.\n",
+    "\n",
+    "This dataset allows us to test hypotheses concerning how sensory processing and network responses adapt under different conditions of visual impairment.\n",
+    "\n",
+    "We have three main testing scenarios, each designed to alter the signal-to-noise ratio to simulate different levels of visual impairment:\n",
+    "\n",
+    "- **Suprathreshold stimulus condition**: here, the network is tested against familiar patterns used during training to assess its response to known stimuli.\n",
+    "- **Subthreshold stimulus condition**: this condition slightly increases the noise level, akin to actual blindsight conditions, testing the network's capability to discern subtle signals.\n",
+    "- **Low vision condition**: the intensity of stimuli is decreased to evaluate how well the network performs with significantly reduced sensory input."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0b549db9-e8b0-4c49-89d2-b7324b3a4ed1",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "factor=2\n",
+    "\n",
+    "initialize_global()\n",
+    "set_1, _ = create_patterns(0,factor)\n",
+    "set_2, _ = create_patterns(1,factor)\n",
+    "set_3, _ = create_patterns(2,factor)\n",
+    "\n",
+    "# Plot\n",
+    "plot_signal_max_and_indicator(set_1.detach().cpu(), \"Suprathreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_2.detach().cpu(), \"Subthreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_3.detach().cpu(), \"Low Vision dataset\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96a91af5-c498-429d-a407-afa66d7444db",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "The first-order network model lays the groundwork for our experiments and is structured as follows:\n",
+    "\n",
+    "- Input layer: consists of 100 units representing either noise or stimulus patterns.\n",
+    "- Hidden layer: includes a 40-unit layer tasked with processing the inputs.\n",
+    "- Output layer: comprises 100 units where the responses to stimuli are recorded.\n",
+    "- Dropout and activation: includes dropout layers to prevent overfitting and a temperature-controlled activation function to fine-tune response sharpness.\n",
+    "\n",
+    "The primary aim of the first-order network is to accurately capture and react to the input patterns, setting a baseline for comparison with more complex models."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "768e074d-1a07-4f3e-8a5d-de31849e7730",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Coding Exercise 2: Developing a Second-Order Network\n",
+    "\n",
+    "Your task is to expand upon the first-order network by integrating a second-order network that incorporates a metacognitive layer assessing the predictions of the first-order network. This metacognitive layer introduces a wagering mechanism, wherein the network \"bets\" on its confidence in its predictions. \n",
+    "\n",
+    "- The first-order network is designed as an autoencoder, a type of neural network trained to reconstruct the input stimulus. The autoencoder consists of an encoder that compresses the input into a latent representation and a decoder that reconstructs the input from this representation.\n",
+    "- The second-order network, or metacognitive layer, operates by examining the difference (delta) between the original input and the output generated by the autoencoder. This difference provides insight into the reconstruction error, which is a measure of how accurately the autoencoder has learned to replicate the input data. By evaluating this reconstruction error, the second-order network can make a judgement about the certainty of the first-order network's predictions.\n",
+    "\n",
+    "These are the steps for completion:\n",
+    "\n",
+    "1. Architectural development: grasp the underlying principles of a second-order network and complete the architectural code.\n",
+    "2. Performance evaluation: visualize training losses and test the model using provided code, assessing its initial performance.\n",
+    "3. Model fine-tuning: leveraging the provided training function, experiment with fine-tuning the model to enhance its accuracy and efficiency.\n",
+    "\n",
+    "The second-order network is structured as a feedforward backpropagation network.\n",
+    "\n",
+    "- Input layer: comprises a 100-unit comparison matrix. This matrix quantifies the discrepancy between each corresponding pair of input and output units from the first-order network. For example, if an input unit and its corresponding output unit have activations of 0.6 and 0.7, respectively, the comparison unit's activation would be -0.1. This setup essentially encodes the prediction error of the first-order network's outputs as an input pattern for the second-order network.\n",
+    "- Output layer: consists of two units representing \"high\" and \"low\" wagers, indicating the network's confidence in its predictions. The initial weights for these output units range between 0.0 and 0.1.\n",
+    "- Comparator weights: set to 1.0 for connections from the first-order input layer to the comparison matrix, and -1.0 for connections from the first-order output layer. This configuration emphasizes the differential error as a critical input for the second-order decision-making process.\n",
+    "\n",
+    "The second-order network's novel approach uses the error generated by the first-order network as a direct input for making decisions—specifically, wagering on the confidence of its outputs. This methodology reflects a metacognitive layer of processing, akin to evaluating one's confidence in their answers or predictions.\n",
+    "\n",
+    "By exploring these adjustments, you can optimize the network's functionality, making it a powerful tool for understanding and simulating complex cognitive phenomena like blindsight."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2c37e357-e5e6-40b2-8507-f83161f5d85f",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "class SecondOrderNetwork(nn.Module):\n",
+    "    def __init__(self, use_gelu):\n",
+    "        super(SecondOrderNetwork, self).__init__()\n",
+    "        # Define a linear layer for comparing the difference between input and output of the first-order network\n",
+    "        self.comparison_layer = nn.Linear(100, 100)\n",
+    "\n",
+    "        # Linear layer for determining wagers, mapping from 100 features to a single output\n",
+    "        self.wager = nn.Linear(100, 1)\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting by randomly setting input units to 0 with a probability of 0.5 during training\n",
+    "        self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "        # Select activation function based on the `use_gelu` flag\n",
+    "        self.activation = torch.relu\n",
+    "\n",
+    "        # Additional activation functions for potential use in network operations\n",
+    "        self.sigmoid = torch.sigmoid\n",
+    "\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize the weights of the network\n",
+    "        self._init_weights()\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def forward(self, first_order_input, first_order_output):\n",
+    "        ############################################################\n",
+    "        # Fill in the wager value\n",
+    "        # Applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        raise NotImplementedError(\"Student exercise\")\n",
+    "        ############################################################\n",
+    "\n",
+    "        # Calculate the difference between the first-order input and output\n",
+    "        comparison_matrix = first_order_input - first_order_output\n",
+    "\n",
+    "        #Another option is to directly calculate the per unit MSE to use as input for the comparator matrix\n",
+    "        #comparison_matrix = nn.MSELoss(reduction='none')(first_order_output, first_order_input)\n",
+    "\n",
+    "        # Pass the difference through the comparison layer and apply the chosen activation function\n",
+    "        comparison_out=self.dropout(self.activation(self.comparison_layer(comparison_matrix)))\n",
+    "\n",
+    "        # Calculate the wager value, applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        wager = ...\n",
+    "\n",
+    "        return wager"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d931cb5-a87a-48be-8760-79512b9d88f7",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# to_remove solution\n",
+    "class SecondOrderNetwork(nn.Module):\n",
+    "    def __init__(self, use_gelu):\n",
+    "        super(SecondOrderNetwork, self).__init__()\n",
+    "        # Define a linear layer for comparing the difference between input and output of the first-order network\n",
+    "        self.comparison_layer = nn.Linear(100, 100)\n",
+    "\n",
+    "        # Linear layer for determining wagers, mapping from 100 features to a single output\n",
+    "        self.wager = nn.Linear(100, 1)\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting by randomly setting input units to 0 with a probability of 0.5 during training\n",
+    "        self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "        # Select activation function based on the `use_gelu` flag\n",
+    "        self.activation = torch.relu\n",
+    "\n",
+    "        # Additional activation functions for potential use in network operations\n",
+    "        self.sigmoid = torch.sigmoid\n",
+    "\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize the weights of the network\n",
+    "        self._init_weights()\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def forward(self, first_order_input, first_order_output):\n",
+    "        # Calculate the difference between the first-order input and output\n",
+    "        comparison_matrix = first_order_input - first_order_output\n",
+    "\n",
+    "        #Another option is to directly calculate the per unit MSE to use as input for the comparator matrix\n",
+    "        #comparison_matrix = nn.MSELoss(reduction='none')(first_order_output, first_order_input)\n",
+    "\n",
+    "        # Pass the difference through the comparison layer and apply the chosen activation function\n",
+    "        comparison_out=self.dropout(self.activation(self.comparison_layer(comparison_matrix)))\n",
+    "\n",
+    "        # Calculate the wager value, applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        wager = self.sigmoid(self.wager(comparison_out))\n",
+    "\n",
+    "        return wager"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "736319ec-2a17-4d80-bb04-b9507ba5db5d",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Second_Order_Network\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "947c8550-a40d-43aa-bfd6-1eb8cead339f",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# First order network instantiation\n",
+    "first_order_network = FirstOrderNetwork(hidden, factor, gelu).to(device)\n",
+    "\n",
+    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "seeds=15\n",
+    "\n",
+    "results_seed=[]\n",
+    "discrimination_seed=[]\n",
+    "\n",
+    "# Hyperparameters\n",
+    "optimizer=\"ADAMAX\"\n",
+    "hidden=40\n",
+    "factor=2\n",
+    "gelu=False\n",
+    "gam=0.98\n",
+    "meta=True\n",
+    "stepsize=25\n",
+    "\n",
+    "for i in range(seeds):\n",
+    "  print(f\"Seed {i}\")\n",
+    "\n",
+    "  # Compare your results with the patterns generate below\n",
+    "  initialize_global()\n",
+    "\n",
+    "  # Prepare networks, loss functions, optimizers, and schedulers for pre-training\n",
+    "  first_order_network, second_order_network, criterion_1, criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2 = prepare_pre_training(hidden, factor, gelu, stepsize, gam)\n",
+    "\n",
+    "  # Conduct pre-training for both the first-order and second-order networks\n",
+    "  first_order_network_pre, second_order_network_pre, epoch_1_order, epoch_2_order , max_value_indices = pre_train(first_order_network, second_order_network, criterion_1,  criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2, factor, meta)\n",
+    "\n",
+    "  # Plot the training progress of both networks to visualize performance and learning trends\n",
+    "  pre_train_plots(epoch_1_order, epoch_2_order, f\"1st & 2nd Order Networks - Seed {i}\" , max_value_indices )\n",
+    "\n",
+    "  # Configuration step for the main training phase or evaluation\n",
+    "  testing_patterns, n_samples = get_test_patterns(factor)\n",
+    "\n",
+    "  # Function to test the model using the configured testing patterns\n",
+    "  first_order_network_pre.eval()\n",
+    "  second_order_network_pre.eval()\n",
+    "  f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, first_order_network_pre, second_order_network_pre,factor)\n",
+    "  results_seed.append(results_for_plotting)\n",
+    "  discrimination_seed.append(discrimination_performances)\n",
+    "\n",
+    "plot_testing(results_seed, discrimination_seed, seeds, \"Test Results\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2047ee8a-4ebc-41dc-a77a-4e17f7c74947",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Discussion point\n",
+    "\n",
+    "Let's dive into the outcomes!\n",
+    "\n",
+    "- Did you notice any variations between the two models?\n",
+    "- Can you explain how these differences influenced the performance?\n",
+    "- What role does a second-order network play, and in which situations would it be more effective?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "55115815-beb2-4f19-a598-9b129ff87637",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Discussion_Point_Second_Order_Network\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a5a880a9-a069-4e0f-a481-f3b85b6a3952",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Video 1: Second Order Network\n",
+    "\n",
+    "from ipywidgets import widgets\n",
+    "from IPython.display import YouTubeVideo\n",
+    "from IPython.display import IFrame\n",
+    "from IPython.display import display\n",
+    "\n",
+    "class PlayVideo(IFrame):\n",
+    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
+    "    self.id = id\n",
+    "    if source == 'Bilibili':\n",
+    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
+    "    elif source == 'Osf':\n",
+    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
+    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
+    "\n",
+    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
+    "  tab_contents = []\n",
+    "  for i, video_id in enumerate(video_ids):\n",
+    "    out = widgets.Output()\n",
+    "    with out:\n",
+    "      if video_ids[i][0] == 'Youtube':\n",
+    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
+    "                             height=H, fs=fs, rel=0)\n",
+    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
+    "      else:\n",
+    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
+    "                          height=H, fs=fs, autoplay=False)\n",
+    "        if video_ids[i][0] == 'Bilibili':\n",
+    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
+    "        elif video_ids[i][0] == 'Osf':\n",
+    "          print(f'Video available at https://osf.io/{video.id}')\n",
+    "      display(video)\n",
+    "    tab_contents.append(out)\n",
+    "  return tab_contents\n",
+    "\n",
+    "video_ids = [('Youtube', 'lHRP14mxXv8'), ('Bilibili', 'BV1jM4m1S7ek')]\n",
+    "tab_contents = display_videos(video_ids, W=854, H=480)\n",
+    "tabs = widgets.Tab()\n",
+    "tabs.children = tab_contents\n",
+    "for i in range(len(tab_contents)):\n",
+    "  tabs.set_title(i, video_ids[i][0])\n",
+    "display(tabs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8a54d67b-507e-4a8a-9715-0aacdeb06f26",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Video_1\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a694f1e-3f32-48fc-bce8-0b544d43ca62",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Coding Exercise 3: Plot Surfaces for Content / Awareness Inference\n",
+    "\n",
+    "To explore the properties of the HOSS model, we can simulate inference at different levels of the hierarchy over the full 2D space of possible input X's. The left panel below shows that the probability of awareness (of any stimulus contents) rises in a graded manner from the lower left corner of the graph (low activation of any feature) to the upper right (high activation of both features). In contrast, the right panel shows that confidence in making a discrimination response (e.g. rightward vs. leftward) increases away from the major diagonal, as the model becomes sure that the sample was generated by either a leftward or rightward tilted stimulus.\n",
+    "\n",
+    "Together, the two surfaces make predictions about the relationships we might see between discrimination confidence and awareness in a simple psychophysics experiment. One notable prediction is that discrimination could still be possible - and lead to some degree of confidence - even when the higher-order node is \"reporting\" unawareness of the stimulus.\n",
+    "\n",
+    "Now, let's get hands on and plot those auto-generated patterns!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "77fbfe70",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "def HOSS_evaluate(X, mu, Sigma, Aprior, Wprior):\n",
+    "    \"\"\"\n",
+    "    Inference on 2D Bayes net for asymmetric inference on presence vs. absence.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Initialise variables and conditional prob tables\n",
+    "    p_A = np.array([1 - Aprior, Aprior])  # prior on awareness state A\n",
+    "    p_W_a1 = np.append(0, Wprior)  # likelihood of world states W given aware, first entry is absence\n",
+    "    p_W_a0 = np.append(1, np.zeros(len(Wprior)))  # likelihood of world states W given unaware, first entry is absence\n",
+    "    p_W = (p_W_a1 + p_W_a0) / 2  # prior on W marginalising over A (for KL)\n",
+    "\n",
+    "    # Compute likelihood of observed X for each possible W (P(X|mu_w, Sigma))\n",
+    "    lik_X_W = np.array([multivariate_normal.pdf(X, mean=mu_i, cov=Sigma) for mu_i in mu])\n",
+    "    p_X_W = lik_X_W / lik_X_W.sum()  # normalise to get P(X|W)\n",
+    "\n",
+    "    # Combine with likelihood of each world state w given awareness state A\n",
+    "    lik_W_A = np.vstack((p_X_W * p_W_a0 * p_A[0], p_X_W * p_W_a1 * p_A[1]))\n",
+    "    post_A = lik_W_A.sum(axis=1)  # sum over W\n",
+    "    post_A = post_A / post_A.sum()  # normalise\n",
+    "\n",
+    "    # Posterior over W (P(W|X=x) marginalising over A)\n",
+    "    post_W = lik_W_A.sum(axis=0)  # sum over A\n",
+    "    post_W = post_W / post_W.sum()  # normalise\n",
+    "\n",
+    "    # KL divergences\n",
+    "    KL_W = (post_W * np.log(post_W / p_W)).sum()\n",
+    "    KL_A = (post_A * np.log(post_A / p_A)).sum()\n",
+    "\n",
+    "    return post_W, post_A, KL_W, KL_A"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "31503073-a7c0-4502-8d94-5ffa47a22926",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the grid\n",
+    "xgrid = np.arange(0, 2.01, 0.01)\n",
+    "\n",
+    "# Define the means for the Gaussian distributions\n",
+    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
+    "\n",
+    "# Define the covariance matrix\n",
+    "Sigma = np.array([[1, 0], [0, 1]])\n",
+    "\n",
+    "# Prior probabilities\n",
+    "Wprior = np.array([0.5, 0.5])\n",
+    "Aprior = 0.5\n",
+    "\n",
+    "# Initialize arrays to hold confidence and posterior probability\n",
+    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
+    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
+    "\n",
+    "# Compute confidence and posterior probability for each point in the grid\n",
+    "for i, xi in tqdm(enumerate(xgrid), total=len(xgrid), desc='Outer Loop'):\n",
+    "    for j, xj in enumerate(xgrid):\n",
+    "        X = [xi, xj]\n",
+    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
+    "        confW[i, j] = max(post_w[1], post_w[2])\n",
+    "        posteriorAware[i, j] = post_A[1]\n",
+    "\n",
+    "with plt.xkcd():\n",
+    "\n",
+    "    # Plotting\n",
+    "    plt.figure(figsize=(10, 5))\n",
+    "\n",
+    "    # Posterior probability \"seen\"\n",
+    "    plt.subplot(1, 2, 1)\n",
+    "    plt.contourf(xgrid, xgrid, posteriorAware.T)\n",
+    "    plt.colorbar()\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('Posterior probability \"seen\"')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    # Confidence in identity\n",
+    "    plt.subplot(1, 2, 2)\n",
+    "    contour_set = plt.contourf(xgrid, xgrid, confW.T)\n",
+    "    plt.colorbar()\n",
+    "    plt.contour(xgrid, xgrid, posteriorAware.T, levels=[0.5], linewidths=4, colors=['white'])  # Line contour for threshold\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('Confidence in identity')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d129657-62aa-42d1-970a-93fd67736b69",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Simulate KL-divergence surfaces\n",
+    "\n",
+    "We can also simulate KL-divergences (a measure of Bayesian surprise) at each layer in the network, which under predictive coding models of the brain, has been proposed to scale with neural activation (e.g., Friston, 2005; Summerfield & de Lange, 2014)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "66044263-c8de-49a9-a56b-2e7336cc737c",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the grid\n",
+    "xgrid = np.arange(0, 2.01, 0.01)\n",
+    "\n",
+    "# Define the means for the Gaussian distributions\n",
+    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
+    "\n",
+    "# Define the covariance matrix\n",
+    "Sigma = np.array([[1, 0], [0, 1]])\n",
+    "\n",
+    "# Prior probabilities\n",
+    "Wprior = np.array([0.5, 0.5])\n",
+    "Aprior = 0.5\n",
+    "\n",
+    "# Initialize arrays to hold confidence and posterior probability\n",
+    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
+    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
+    "\n",
+    "# Compute confidence and posterior probability for each point in the grid\n",
+    "for i, xi in enumerate(xgrid):\n",
+    "    for j, xj in enumerate(xgrid):\n",
+    "        X = [xi, xj]\n",
+    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
+    "\n",
+    "        confW[i, j] = max(post_w[1], post_w[2])\n",
+    "        posteriorAware[i, j] = post_A[1]\n",
+    "\n",
+    "# Calculate the mean K-L divergence for absent and present awareness states\n",
+    "KL_A_absent = np.mean(KL_A[posteriorAware < 0.5])\n",
+    "KL_A_present = np.mean(KL_A[posteriorAware >= 0.5])\n",
+    "KL_w_absent = np.mean(KL_w[posteriorAware < 0.5])\n",
+    "KL_w_present = np.mean(KL_w[posteriorAware >= 0.5])\n",
+    "\n",
+    "with plt.xkcd():\n",
+    "\n",
+    "    # Plotting\n",
+    "    plt.figure(figsize=(18, 6))\n",
+    "\n",
+    "    # K-L divergence, perceptual states\n",
+    "    plt.subplot(1, 2, 1)\n",
+    "    plt.contourf(xgrid, xgrid, KL_w.T, cmap='viridis')\n",
+    "    plt.colorbar()\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('KL-divergence, perceptual states')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    # K-L divergence, awareness state\n",
+    "    plt.subplot(1, 2, 2)\n",
+    "    plt.contourf(xgrid, xgrid, KL_A.T, cmap='viridis')\n",
+    "    plt.colorbar()\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('KL-divergence, awareness state')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b32e4908-0f6f-4259-832f-045adcb19700",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Discussion point\n",
+    "\n",
+    "Can you recognise the difference between the KL divergence for the W-level and the one for the A-level?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7d8deb66-9a1d-49e1-a3ef-96970efa8d97",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# to_remove explanation\n",
+    "\"\"\"\n",
+    "At the level of perceptual states W, there is a substantial asymmetry in the KL-divergence expected when the\n",
+    "model says ‘seen’ vs. ‘unseen’ (lefthand panel). This is due to the large belief updates invoked in the\n",
+    "perceptual layer W by samples that deviate from the lower lefthand corner - from absence. In contrast, when\n",
+    "we compute KL-divergence for the A-level (righthand panel), the level of prediction error is symmetric across\n",
+    "seen and unseen decisions, leading to \"hot\" zones both at the upper righthand (present) and lower lefthand\n",
+    "(absent) corners of the 2D space.\n",
+    "\n",
+    "Intuitively, this means that at the W-level, there's a noticeable difference in the KL-divergence values\n",
+    "between \"seen\" and \"unseen\" predictions. This large difference is mainly due to significant updates in the\n",
+    "model's beliefs at this level when the detected samples are far from what is expected under the condition of\n",
+    "\"absence.\" However, when we analyze the K-L divergence at the A-level, the discrepancies in prediction errors\n",
+    "between \"seen\" and \"unseen\" are balanced. This creates equally strong responses in the model, whether something\n",
+    "is detected or not detected.\n",
+    "\n",
+    "We can also sort the KL-divergences as a function of whether the model \"reported\" presence or absence. As\n",
+    "can be seen in the bar plots below, there is more asymmetry in the prediction error at the W compared to the\n",
+    "A levels.\n",
+    "\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "869fc8f1-4199-4525-80b3-26e74babc66a",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "with plt.xkcd():\n",
+    "\n",
+    "    # Create figure with specified size\n",
+    "    plt.figure(figsize=(10, 5))\n",
+    "\n",
+    "    # KL divergence for W states\n",
+    "    plt.subplot(1, 2, 1)\n",
+    "    plt.bar(['unseen', 'seen'], [KL_w_absent, KL_w_present], color='k')\n",
+    "    plt.ylabel('KL divergence, W states')\n",
+    "    plt.xticks(fontsize=18)\n",
+    "    plt.yticks(fontsize=18)\n",
+    "\n",
+    "    # KL divergence for A states\n",
+    "    plt.subplot(1, 2, 2)\n",
+    "    plt.bar(['unseen', 'seen'], [KL_A_absent, KL_A_present], color='k')\n",
+    "    plt.ylabel('KL divergence, A states')\n",
+    "    plt.xticks(fontsize=18)\n",
+    "    plt.yticks(fontsize=18)\n",
+    "\n",
+    "    plt.tight_layout()\n",
+    "\n",
+    "    # Show plot\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "64ecb92c-bfe3-4e49-bd40-f11ffa685ece",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_HOSS_Bonus_Content\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcd87344-d473-44af-a881-b68e5471d353",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "---\n",
+    "# Discussion\n",
+    "This section contains an extra discussion exercise if you have time and inclination."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca33829c-8d54-437e-ba33-d3003af51d7a",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "In this bonus section, Megan and Anil will delve into the complexities of defining and testing for consciousness, particularly in the context of artificial intelligence. We will explore various theoretical perspectives, examine classic and contemporary tests for consciousness, and discuss the challenges and ethical implications of determining whether a system truly possesses conscious experience."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "21b621ea-1639-4131-8ec3-9cdf34a64f77",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Video 2: Consciousness Bonus Content\n",
+    "\n",
+    "from ipywidgets import widgets\n",
+    "from IPython.display import YouTubeVideo\n",
+    "from IPython.display import IFrame\n",
+    "from IPython.display import display\n",
+    "\n",
+    "class PlayVideo(IFrame):\n",
+    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
+    "    self.id = id\n",
+    "    if source == 'Bilibili':\n",
+    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
+    "    elif source == 'Osf':\n",
+    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
+    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
+    "\n",
+    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
+    "  tab_contents = []\n",
+    "  for i, video_id in enumerate(video_ids):\n",
+    "    out = widgets.Output()\n",
+    "    with out:\n",
+    "      if video_ids[i][0] == 'Youtube':\n",
+    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
+    "                             height=H, fs=fs, rel=0)\n",
+    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
+    "      else:\n",
+    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
+    "                          height=H, fs=fs, autoplay=False)\n",
+    "        if video_ids[i][0] == 'Bilibili':\n",
+    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
+    "        elif video_ids[i][0] == 'Osf':\n",
+    "          print(f'Video available at https://osf.io/{video.id}')\n",
+    "      display(video)\n",
+    "    tab_contents.append(out)\n",
+    "  return tab_contents\n",
+    "\n",
+    "video_ids = [('Youtube', '00dL8q7WgcU'), ('Bilibili', 'BV12n4y1Q7C2')]\n",
+    "tab_contents = display_videos(video_ids, W=854, H=480)\n",
+    "tabs = widgets.Tab()\n",
+    "tabs.children = tab_contents\n",
+    "for i in range(len(tab_contents)):\n",
+    "  tabs.set_title(i, video_ids[i][0])\n",
+    "display(tabs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "39c202fb-f580-4a96-8f8e-bad24ed1d55c",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Video_2\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9e839f2-e237-4ed4-9045-56dc7b5f6d60",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "## Discussion activity: Is it actually conscious?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2720c0b5-6386-43a6-9647-f1245531c376",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "We discussed the difference between these two...\n",
+    "- \"Forward\" tests: passing means the machine is conscious (or intelligent).\n",
+    "- \"Reverse\" tests: passing means humans are convinced that a machine is conscious (or intelligent).\n",
+    "\n",
+    "**Discuss!** If a system (AI, other animal, other human) exhibited all the \"right signs\" of being conscious, how can we know for sure it is actually conscious? How could you design a test to be a true forward test?\n",
+    "\n",
+    "- Room 1: I think you could design a forward test in this way... [share your ideas]\n",
+    "- Room 2: I think a forward test is impossible, and here's why [share your ideas]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "84958157-c165-4cc3-be76-408999cf44ad",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Discussion_activity\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "collapsed_sections": [],
+   "include_colab_link": true,
+   "name": "W2D5_Tutorial3",
+   "provenance": [],
+   "toc_visible": true
+  },
+  "kernel": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.22"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/W2D5_Mysteries/instructor/W2D5_Intro.ipynb b/tutorials/W2D5_Mysteries/instructor/W2D5_Intro.ipynb
index bc018d3c2..e7bea670c 100644
--- a/tutorials/W2D5_Mysteries/instructor/W2D5_Intro.ipynb
+++ b/tutorials/W2D5_Mysteries/instructor/W2D5_Intro.ipynb
@@ -59,6 +59,19 @@
     "feedback_prefix = \"W2D5_Intro\""
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "# Mysteries \n",
+    "\n",
+    "Welcome to the final day of the NeuroAI course! You've covered a wide range of topics and we hope you have enjoyed the content we've put together and that you've put your mind to work in absorbing some of the low-level as well as the high-level details of some of this - at time - tricky and mathematically detailed content. As you can tell from the title of this final day, we're switching to a different type of educational content. We're leaving you with some of the open mysteries in the field and talking you through some of the on-going work aimed at finding solutions. \n",
+    "\n",
+    "We hope with the tools we've equipped you with, you might be inspired by some of the active mysteries and perhaps your name(s) will be on papers in the future that aim to provide some solid work that goes further to help understand the underlying mechanisms behind some of these super interesting ideas."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -214,7 +227,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.22"
   }
  },
  "nbformat": 4,
diff --git a/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial1.ipynb b/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial1.ipynb
index 78da931b0..776abfb0d 100644
--- a/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial1.ipynb
+++ b/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial1.ipynb
@@ -25,9 +25,9 @@
     "\n",
     "__Content creators:__ Steve Fleming, Guillaume Dumas, Samuele Bolotta, Juan David Vargas, Hakwan Lau, Anil Seth, Megan Peters\n",
     "\n",
-    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Patrick Mineault\n",
+    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Patrick Mineault, Alex Murphy\n",
     "\n",
-    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Patrick Mineault\n"
+    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Patrick Mineault, Alex Murphy\n"
    ]
   },
   {
@@ -50,7 +50,9 @@
     "\n",
     "2. Explore core frameworks for analyzing consciousness, including diagnostic criteria, and will compare objective probabilities with subjective credences.\n",
     "\n",
-    "3. Explore reductionist theories of consciousness, such as Global Workspace Theory (GWT), theories of metacognition, and Higher-Order Thought (HOT) theories.\n"
+    "3. Explore reductionist theories of consciousness, such as Global Workspace Theory (GWT), theories of metacognition, and Higher-Order Thought (HOT) theories.\n",
+    "\n",
+    "The topic of consciousness and what it means to be *conscious* is a long-standing open question in neuroscience and recently has drawn a lot of attention in machine learning in the context of large language models and foundation models. People have claimed that these models exhibits sparks of consciousness and a strong debate in the community continues to rage on. It's therefore likely a big issue that will continue to gain a lot of traction in the space of NeuroAI and we hope you can start to build some familiarity with the tools use to quantify and study this fascinating topic. \n"
    ]
   },
   {
@@ -450,6 +452,7 @@
     "\n",
     "        # Close the figure to free up memory\n",
     "        plt.close(fig)\n",
+    "\n",
     "# Function to configure the training environment and load the models\n",
     "def get_test_patterns(factor):\n",
     "    \"\"\"\n",
@@ -532,7 +535,7 @@
     "            discrimination_performances.append(discrimination_performance)\n",
     "\n",
     "\n",
-    "            chance_level = torch.Tensor( generate_chance_level((200*factor,100)))\n",
+    "            chance_level = torch.Tensor( generate_chance_level((200*factor,100))).to(device)\n",
     "            discrimination_random= round((chance_level[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean().item(), 2)\n",
     "            print(\"chance level\" , discrimination_random)\n",
     "\n",
@@ -1024,7 +1027,9 @@
     "              \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n",
     "              \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n",
     "\n",
-    "    return device"
+    "    return device\n",
+    "\n",
+    "device = set_device()"
    ]
   },
   {
@@ -1275,9 +1280,24 @@
    "source": [
     "In this section, we are exploring an important concept in machine learning: the idea that the complexity we observe in the physical world often arises from simpler, independently functioning parts. Think of the world as being made up of different modules or units that usually operate on their own but sometimes interact with each other. This is similar to how different apps on your phone work independently but can share information when needed.\n",
     "\n",
+    "---\n",
+    "\n",
+    "### Modularity Recap\n",
+    "Remember in W2D1, our day entitled **Macrocircuits**? In Tutorial 3 of that day, the focus was on neural network modularity and we showed you that, compared to a single holistic architecture, having separable modular approaches, each with their own inductive biases, provided a much more efficient mechanism to model complex data. Not only that, but these sub-modules had stronger inductive biases and were easily generalizable to novel inputs. Today, we're also shining a spotlight on the similar idea, but from a much more integrative perspective applied to the grand idea of modeling consciousness. Those of you who are interested should review this tutorial and the ideas on modularity and how it can support complex systems more efficiently than holistic unitary mechanisms.\n",
+    "\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fb33b12",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
     "This idea is closely linked to the field of causal inference, which studies how these separate units or mechanisms cause and influence each other. The goal is to understand and model how these mechanisms work both individually and together. Importantly, these mechanisms often interact only minimally, which means they can keep working properly even if changes occur in other parts. This characteristic makes them very robust, or capable of handling disturbances well.\n",
     "\n",
-    "A specific example from machine learning that uses this idea is called Recurrent Independent Mechanisms (RIMs). In RIMs, different parts of the model mostly work independently, but they can also communicate or \"pay attention\" to each other when it’s necessary. This setup allows for efficient and dynamic processing of information. The research paper available here (https://arxiv.org/pdf/1909.10893) discusses this approach in detail. It highlights the benefits of designing models that recognize and utilize the independence and occasional interactions of these mechanisms. Such models are often more adaptable and can generalize better, meaning they perform well across a variety of different tasks or situations."
+    "A specific example from machine learning that uses this idea is called Recurrent Independent Mechanisms (RIMs). In RIMs, different parts of the model mostly work independently, but they can also communicate or \"pay attention\" to each other when it’s necessary. This setup allows for efficient and dynamic processing of information. The research paper available here (https://arxiv.org/pdf/1909.10893) discusses this approach in detail."
    ]
   },
   {
@@ -1287,7 +1307,7 @@
     "execution": {}
    },
    "source": [
-    "### RIMs\n",
+    "### Recurrent Independent Mechanisms (RIMs)\n",
     "\n",
     "RIM networks are a type of recurrent neural network that process temporal sequences. Inputs are processed one element at a time, the different units of the network process the inputs, a hidden state is updated and propagated through time. RIM networks can thus be used as a drop-in replacement for RNNs like LSTMs or GRUs. The key differences are that:\n",
     "\n",
@@ -1297,26 +1317,26 @@
     "\n",
     "**Selecting the input**\n",
     "\n",
-    "Each RIM unit gets activated and updated when the input is pertinent to it. Using key-value attention, the queries originate from the RIMs, while the keys and values are derived from the current input. The key-value attention mechanisms enable dynamic selection of which variable instance (i.e., which entity or object) will serve as input to each RIM mechanism:\n",
+    "Recall in W1D5 (Microcircuits) we had a tutorial on **Attention** (Tutorial 3), where we covered how modern Transformer-based neural networks implement attention via the Query matrix, the Key matrix and the Value matrix? If not, you might benefit from reviewing the tutorial videos from that day as these concepts are used in the RIM networks we will look at today. Each RIM unit is activated and updated when the input is attended using the attention mechanism. Using key-value attention (KV matrices), the queries (Q matrix) originate from the RIMs, while the keys and values are derived from the current input. In standard deep learning terminology, this is very closely related to the concept of **cross-attention**. The key-value attention mechanisms enable dynamic selection of which variable instance (i.e., which entity or object) will serve as input to each RIM mechanism:\n",
     "\n",
     "$$\n",
     "\\text{Attention}(Q, K, V) = \\text{softmax}\\left(\\frac{Q K^T}{\\sqrt{d}}\\right) V\n",
     "$$\n",
     "\n",
-    "Linear transformations are used to construct keys $K = XW^e $, values $ V = XW^v $ and queries $Q = h_t W^q_k$.\n",
+    "Linear transformations are used to construct keys $K = XW^k $, values $ V = XW^v $ and queries $Q = h_t W^q_i$.\n",
     "\n",
     "Here:\n",
     "\n",
-    "* $ W^v $ is a matrix mapping from an input element to the corresponding value vector for the weighted attention\n",
-    "* $ W^e $ is a weight matrix which maps the input to the keys.\n",
-    "* $ W^q_k $ is a per-RIM weight matrix which maps from the RIM’s hidden state to its queries.\n",
+    "* $ W^k $ is a weight matrix which maps the input to the keys (Key matrix)\n",
+    "* $ W^v $ is a matrix mapping from an input element to the corresponding value vector for the weighted attention (Value matrix)\n",
+    "* $ W^q_i $ is a per-RIM weight matrix which maps from the RIM’s hidden state to its queries (Query matrix)\n",
     "* $h_t$ is the hidden state for a RIM mechanism.\n",
     "\n",
     "\n",
     "$\\oplus$ refers to the row-level concatenation operator. The attention thus is:\n",
     "\n",
     "$$\n",
-    "A^{(\\text{in})}_k = \\text{softmax}\\left(\\frac{h_t W^q_k (XW^e)^T}{\\sqrt{d_e}}\\right) XW^v, \\text{ where } \\theta^{(\\text{in})}_k = (W^q_k, W^e, W^v)\n",
+    "A^{(\\text{in})}_i = \\text{softmax}\\left(\\frac{h_t W^q_i (XW^k)^T}{\\sqrt{d_e}}\\right) XW^v, \\text{ where } \\theta^{(\\text{in})}_i = (W^q_i, W^k, W^v)\n",
     "$$\n",
     "\n",
     "At each step, the top-k RIMs are selected based on their attention scores for the actual input. Essentially, the RIMs compete at each step to read from the input, and only the RIMs that prevail in this competition are allowed to read from the input and update their state."
@@ -1341,10 +1361,10 @@
    "source": [
     "This figure shows how RIMs work over two steps.\n",
     "\n",
-    "- Query generation: each RIM starts by creating a query. This query helps each RIM pull out the necessary information from the data it receives at that moment.\n",
-    "- Attention-based selection: on the right side of the figure, you can see that some RIMs are chosen to be active (colored in blue) and others stay inactive (colored in white). This selection is made using a special scoring system called attention, which picks RIMs based on how relevant they are to the current visual inputs.\n",
-    "- State transition for active RIMs: the RIMs that get activated update their internal states according to their specific rules, using the information they've gathered. The RIMs that aren’t activated don’t change and keep their previous states.\n",
-    "- Communication between RIMs: finally, the active RIMs share information with each other, but this communication is limited. They use a system similar to key-value pairing, which helps them share only the most important information needed for the next step."
+    "- **Query generation**: each RIM starts by creating a query. This query helps each RIM pull out the necessary information from the data it receives at that moment.\n",
+    "- **Attention-based selection**: on the right side of the figure, you can see that some RIMs are chosen to be active (colored in blue) and others stay inactive (colored in white). This selection is made using a special scoring system called attention, which picks RIMs based on how relevant they are to the current visual inputs.\n",
+    "- **State transition for active RIMs**: the RIMs that get activated update their internal states according to their specific rules, using the information they've gathered. The RIMs that aren’t activated don’t change and keep their previous states.\n",
+    "- **Communication between RIMs**: finally, the active RIMs share information with each other, but this communication is limited. They use a system similar to key-value pairing, which helps them share only the most important information needed for the next step."
    ]
   },
   {
@@ -1354,7 +1374,7 @@
     "execution": {}
    },
    "source": [
-    "To make this more concrete, consider the example of modeling the motion of two balls. We can think of each ball as an independent mechanism. Although both balls are affected by Earth’s gravity, and very slightly by each other, they generally move independently. They only interact significantly when they collide. This model captures the essence of independent mechanisms interacting sparsely, a key idea in developing more effective and generalizable AI systems.\n",
+    "To make this more concrete, consider the example of modeling the motion of two balls. We can think of each ball as an independent mechanism. Although both balls are affected by Earth’s gravity, and very slightly from each other, they generally move independently. The only interaction that is significant occurs when they collide. This model captures the essence of independent mechanisms interacting **sparsely**, a key idea in developing more effective and generalizable AI systems (see W1D5 - Tutorial 1 for the tutorial devoted entirely to sparsity and its benefits).\n",
     "\n",
     "Now, let's download the RIM model!"
    ]
@@ -1423,11 +1443,11 @@
     "\n",
     "This is the test setup:\n",
     "\n",
-    "1. Train on 14x14 images of MNIST digits\n",
+    "1. Train on `14x14` images of MNIST digits\n",
     "2. Test on:\n",
-    "    - 16x16 images (validation set 1)\n",
-    "    - 19x19 images (validation set 2)\n",
-    "    - 24x24 images (validation set 3)\n",
+    "    - `16x16` images (validation set 1)\n",
+    "    - `19x19` images (validation set 2)\n",
+    "    - `24x24` images (validation set 3)\n",
     "\n",
     "This approach helps to understand whether the model can still recognize the digits accurately even when they appear at different scales or resolutions than those on which it was originally trained. By testing the model on various image sizes, we can determine how flexible and effective the model is at dealing with variations in input data.\n",
     "\n",
@@ -1655,7 +1675,7 @@
     "execution": {}
    },
    "source": [
-    "The accuracy of the model on 16x16 images is fairly close to what was observed on smaller images, indicating that the increase in size to 16x16 does not significantly impact the model's ability to recognize the images. However, RIMs demonstrate generalization better, when working with the larger 19x19 and 24x24 images - compared to LSTMs."
+    "The accuracy of the model on `16x16` images is fairly close to what was observed on smaller images, indicating that the increase in size to `16x16` does not significantly impact the model's ability to recognize the images. However, RIMs demonstrate generalization better, when working with the larger `19x19` and `24x24` images - compared to LSTMs."
    ]
   },
   {
@@ -1825,9 +1845,9 @@
     "execution": {}
    },
    "source": [
-    "In this section, we explore a deep learning model based on Global Workspace Theory from cognitive neuroscience. You can read more about this model in the linked research paper here (https://arxiv.org/pdf/2103.01197.pdf). The core idea behind this model is the use of a \"shared global workspace\" which serves as a coordination platform for the various specialized modules within the network.\n",
+    "In this section, we explore a deep learning model based on Global Workspace Theory from cognitive neuroscience. You can read more about this model in the linked research paper here (https://arxiv.org/pdf/2103.01197.pdf). The core idea behind this model is the use of a *shared global workspace* which serves as a coordination platform for the various specialized modules within the network.\n",
     "\n",
-    "Essentially, the model incorporates multiple specialist modules, each focusing on different aspects of a problem. Unlike in the RIM mechanism, these modules do not communicate directly with each other, but rather interact through a central shared memory. Communication with the centeral shared memory is handled, once again, by an attention mechanism."
+    "Essentially, the model incorporates multiple specialist modules, each focusing on different aspects of a problem. Unlike in the RIM mechanism, these modules do not communicate *directly* with each other, but rather interact *indirectly* through a central shared memory. Communication with the centeral shared memory is handled, once again, by an attention mechanism."
    ]
   },
   {
@@ -1847,7 +1867,7 @@
     "execution": {}
    },
    "source": [
-    "By centralizing communication this way, the model mimics how a human brain might focus only on the most relevant information at any given time. It mimics a sort of \"cognitive economy,\" where only the most relevant data is processed and shared among modules, reducing redundancy and enhancing the overall performance of the system. Moreover, the theory embeds some of the assumptions of the Global Workspace Theory (GWT) of consciousness, which suggests that consciousness arises from the ability of various brain processes to access a shared information platform, the **Global Workspace**."
+    "By centralizing communication this way, the model mimics how a human brain might focus only on the most relevant information at any given time. It mimics a sort of \"cognitive economy,\" where only the most relevant data is processed and shared among modules, filtered through a bottleneck that forces the model to use a highly efficient, reducing-redundancy useful representation, which enhances the overall performance of the system. Moreover, the theory embeds some of the assumptions of the Global Workspace Theory (GWT) of consciousness, which suggests that consciousness arises from the ability of various brain processes to access a shared information platform, the **Global Workspace**."
    ]
   },
   {
@@ -1867,13 +1887,17 @@
     "execution": {}
    },
    "source": [
-    "In our model, the interaction among the modules (or specialists) and the shared workspace is managed by a key-query-value cross-attention mechanism. Here’s how it works:\n",
+    "In our model, the interaction among the modules (or specialists) and the shared workspace is managed by a QKV cross-attention mechanism, as explained above. \n",
+    "\n",
+    "Here’s how it works:\n",
+    "\n",
+    "- **Key**: Each specialist module generates a key which represents the type of information the module wants to share.\n",
+    "- **Query**: The workspace generates a query at each computational step. This query represents what the workspace needs to know next to facilitate the overall task.\n",
+    "- **Value**: Each specialist also prepares a value, which is the actual information it proposes to add to the workspace.\n",
     "\n",
-    "- Key: Each specialist module generates a key which represents the type of information the module wants to share.\n",
-    "- Query: The workspace generates a query at each computational step. This query represents what the workspace needs to know next to facilitate the overall task.\n",
-    "- Value: Each specialist also prepares a value, which is the actual information it proposes to add to the workspace.\n",
+    "Please refer to the textual explanation above where some of this is defined in a bit more detail if this is still unclear to you.\n",
     "\n",
-    "Fill in the code below to implement this mechanism."
+    "Your task is to fill in the code below to implement this mechanism."
    ]
   },
   {
@@ -2030,7 +2054,9 @@
     "execution": {}
    },
    "source": [
-    "After updating the shared workspace with the most critical signals, this information is then broadcast back to all specialists. Each specialist updates its state using this broadcast information, which can involve an attention mechanism for consolidation and an update function (like an LSTM or GRU step) based on the new combined state. Let's add this method!"
+    "After updating the shared workspace with the most critical signals, this information is then broadcast back to all specialists. Each specialist updates its state using this broadcast information, which can involve an attention mechanism for consolidation and an update function (like an LSTM or GRU step) based on the new combined state. \n",
+    "\n",
+    "Let's add this method!"
    ]
   },
   {
@@ -2205,9 +2231,7 @@
     "execution": {}
    },
    "source": [
-    "Blindsight is a neurological phenomenon where individuals with damage to their primary visual cortex can still respond to visual stimuli without consciously perceiving them.\n",
-    "\n",
-    "\n",
+    "Blindsight is a neurological phenomenon where individuals with damage to their primary visual cortex can still respond to visual stimuli without consciously perceiving them. Megan showed you a video of this from a real patient navigating a corridor and successfully avoiding objects that researchers had strategically placed in his way, which the patient navigated successfully.\n",
     "\n",
     "To study this, we use a simulated dataset that mimics the conditions of blindsight. This dataset contains 400 patterns, equally split between two types:\n",
     "\n",
@@ -2805,11 +2829,11 @@
     "execution": {}
    },
    "source": [
-    "In this section, we'll merge ideas from earlier discussions to present a fresh perspective on how conscious awareness might arise in neural systems. This view comes from higher-order theory, which suggests that consciousness stems from the ability to monitor basic, or first-order, information processing activities, instead of merely broadcasting information globally. This concept agrees with global workspace theories that emphasize the need for a comprehensive monitor that oversees various first-order processes. Moreover, it extends the ideas discussed previously about the role of a second-order network, which helps us understand phenomena like blindsight, where a person can respond to visual stimuli without consciously seeing them.\n",
+    "In this section, we'll merge ideas from earlier discussions to present a fresh perspective on how conscious awareness might arise in neural systems. This view comes from higher-order theory, which suggests that consciousness stems from the ability to monitor basic, or first-order, information processing activities, instead of merely broadcasting information globally. This concept emphasizes the need for a comprehensive monitor that oversees various first-order processes (like GWT). It extends the idea of the role of a second-order network, which helps us understand phenomena like blindsight.\n",
     "\n",
-    "To analyze how our brains handle and update perceptions, we'll operate within a simplified Bayesian framework. This framework helps us evaluate how we perceive reality based on the information we receive. For example, if you hear rustling leaves, your brain calculates the likelihood of it being caused by the wind versus an animal. This calculation involves updating what we initially guess (our prior belief) with new evidence (observed data), resulting in a new, more informed belief (posterior probability).\n",
+    "To analyze how our brains handle and update perceptions, we'll use a Bayesian framework. This framework helps us evaluate how we perceive reality based on the information we receive. For example, if you hear rustling leaves, your brain calculates the likelihood of it being caused by the wind versus an animal. This calculation involves updating what we initially guess (our prior belief) with new evidence (observed data), resulting in a new, more informed belief (posterior probability).\n",
     "\n",
-    "The function below calculates these updated beliefs and uses Kullback-Leibler (KL) divergence to quantify how much the new information changes our understanding. The KL divergence is a way of measuring the 'distance' between the initial belief and your updated belief. In essence, it's measuring how much you have to change your mind given new evidence.\n",
+    "The function below calculates these updated beliefs and uses *Kullback-Leibler (KL)* divergence to quantify how much the new information changes our understanding. The KL divergence is a way of measuring the 'distance' between the initial belief and your updated belief. In essence, it's measuring how much you have to change your mind given new evidence.\n",
     "\n",
     "We base our analysis on a flat, or single-layer, Bayesian network model. This model directly connects our sensory inputs with our perceptual states, simplifying the complex interactions in our brain into a more manageable form. By stripping away the complexities of multi-layered networks, we focus purely on how direct observations impact our consciousness. This simplified approach helps us to better understand the intricate dance between perception and awareness in our neural systems."
    ]
@@ -2857,16 +2881,6 @@
     "    return post_W, KL_W"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "e4cfba4a-b48a-48c5-a554-f03e7096af2e",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "**Make our stimulus space**"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "11ffb999-c213-4400-8f1b-dac5b42ff5e1",
@@ -2874,11 +2888,13 @@
     "execution": {}
    },
    "source": [
-    "The model we are using is grounded in classical \"signal detection theory\", or SDT for short. SDT is in turn a special case of a Bayesian generative model, in which an arbitrary \"evidence\" value is drawn from an unknown distribution, and the task of the observer is to infer which distribution this evidence came from.\n",
+    "### Defining our Stimulus Space\n",
+    "\n",
+    "The model we are using is grounded in classical *Signal Detection Theory* (SDT). SDT is a special case of a Bayesian generative model, in which an arbitrary *evidence* value is drawn from an unknown distribution. The task of the observer is to infer *which distribution* this evidence came from.\n",
     "\n",
-    "In SDT, an observer receives a piece of evidence—this could be any sensory input, like a sound, a light signal, or a statistical data point. The evidence comes from one of several potential distributions. Each distribution represents a different \"state of the world.\" For instance, one distribution might represent the presence of a signal (like a beep), while another might represent just noise. The observer uses Bayesian inference to assess the probability that the received evidence came from one distribution or another. This involves updating their beliefs (probabilities) based on the new evidence. Based on the probabilities calculated through Bayesian inference, the observer decides which distribution most likely produced the evidence.\n",
+    "In SDT, an observer receives a piece of evidence (this could be any sensory input, like a sound, a light signal, or a statistical data point). The evidence comes from one of several potential distributions. Each distribution represents a different *state of the world.* For instance, one distribution might represent the presence of a signal (like a beep), while another might represent just noise. The observer uses Bayesian inference to assess the probability that the received evidence came from one distribution or another. This involves updating their beliefs (probabilities) based on the new evidence. Based on the probabilities calculated through Bayesian inference, the observer decides which distribution most likely produced the evidence.\n",
     "\n",
-    "Let's now imagine we have two categories, A and B - for instance, left- and right-tilted visual stimuli. The sensory \"evidence\" can be written as 2D vector, where the first element is evidence for A, and the second element evidence for B:"
+    "Let's now imagine we have two categories, A and B - for instance, left- and right-tilted visual stimuli. The sensory *evidence* can be written as 2D vector, where the first element is evidence for A, and the second element evidence for B:"
    ]
   },
   {
@@ -3023,7 +3039,7 @@
     "execution": {}
    },
    "source": [
-    "The model partitions the stimuli in the expected way. KL divergence is higher further away from the boundaries, as measuring stimuli far away from the boundaries makes the model rapidly update its beliefs. This is because the model is more certain about the stimuli's class when they are far from the boundaries."
+    "The model partitions the stimuli in the expected way. KL divergence is higher further away from the boundaries, as measuring stimuli far away from the boundaries makes the model rapidly update its beliefs. This is because the model is more *certain* about the stimuli's class when they are far from the boundaries."
    ]
   },
   {
@@ -3033,11 +3049,11 @@
     "execution": {}
    },
    "source": [
-    "**Add in higher-order node for global detection**\n",
+    "#### Add in higher-order node for global detection\n",
     "\n",
-    "So far, our model has been straightforward, or \"flat,\" where each perceptual state (like leftward tilt, rightward tilt, or no stimulus) is treated separately. However, real-life perception often requires more complex judgments about the presence or absence of any stimulus, not just identifying specific types. This is where a higher-order node comes into play.\n",
+    "So far, our model has been straightforward, or *flat*, where each perceptual state (like leftward tilt, rightward tilt, or no stimulus) is treated separately. However, real-life perception often requires more complex judgments about the presence or absence of any stimulus, not just identifying specific types. This is where a higher-order node comes into play.\n",
     "\n",
-    "**Introducing the \"A\" Level:**\n",
+    "#### Introducing the \"A\" Level:\n",
     "\n",
     "Think of the \"A\" level as a kind of overseer or monitor that watches over the lower-level states ($w_1$, $w_2$, etc.). This higher-order node isn't concerned with the specific content of the stimulus (like which direction something is tilting) but rather with whether there's any significant stimulus at all versus just noise. It takes inputs from the same data (pairs of $X$'s), but it adds a layer of awareness. It evaluates whether the data points suggest any meaningful content or if they're likely just random noise.\n",
     "\n",
@@ -3331,8 +3347,11 @@
     "execution": {}
    },
    "source": [
-    "**Simulate ignition (asymmetry vs. symmetry)**\n",
+    "We have included some further details on the notion of ignition. Please feel free to toggle the switch below to learn more. If you're running low on time, then please feel free to run the cell below and come back to this section. The outro video will also cover the broad overview of this concept.\n",
     "\n",
+    "<details>\n",
+    "    <summary>Simulate Ignition (assymetry vs symmetry)</summary>\n",
+    "    \n",
     "The HOSS architecture is designed to detect whether something is there or not. When it detects something, it ends up making more prediction errors in its predictions compared to when it detects nothing. These prediction errors are tracked using a method called Kullback-Leibler (KL) divergence, particularly at a certain level within the model known as the W level.\n",
     "\n",
     "This increase in prediction errors when something is detected is similar to what happens in the human brain, a phenomenon known as global ignition responses. These are big surges in brain activity that happen when we become conscious of something. Research like that conducted by Del Cul et al. (2007) and Dehaene and Changeux (2011) support this concept, linking it to the global workspace model. This model describes consciousness as the sharing of information across different parts of the brain.\n",
@@ -3341,126 +3360,9 @@
     "\n",
     "We then classify these prediction errors based on whether the model recognizes a stimulus as \"seen\" or \"unseen.\" If the model has a response indicating \"seen,\" it shows more activity than when it indicates \"unseen.\" This is what we refer to as ignition — more activity for \"seen\" stimuli.\n",
     "\n",
-    "However, it's crucial to understand that in the HOSS model, these ignition-like responses don't directly cause the global sharing of information in the network. Rather, they are secondary effects or byproducts of other calculations happening within the network. Essentially, these bursts of activity are outcomes of deeper processes in the network, not the direct mechanisms for distributing information throughout the system."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b09f812a-f202-4f3d-ac66-247b322002e7",
-   "metadata": {
-    "colab_type": "text",
-    "execution": {}
-   },
-   "source": [
-    "```python\n",
-    "# Experiment parameters\n",
-    "mu = np.array([[0.5, 0.5], [3.5, 0.5], [0.5, 3.5]])\n",
-    "Nsubjects = 30\n",
-    "Ntrials = 600\n",
-    "cond = np.concatenate((np.ones(Ntrials//3), np.ones(Ntrials//3)*2, np.ones(Ntrials//3)*3))\n",
-    "Wprior = [0.5, 0.5]\n",
-    "Aprior = 0.5\n",
-    "\n",
-    "# Sensory precision values\n",
-    "gamma = np.linspace(0.1, 10, 6)\n",
-    "\n",
-    "# Initialize lists for results\n",
-    "all_KL_w_yes = []\n",
-    "sem_KL_w_yes = []\n",
-    "all_KL_w_no = []\n",
-    "sem_KL_w_no = []\n",
-    "all_KL_A_yes = []\n",
-    "sem_KL_A_yes = []\n",
-    "all_KL_A_no = []\n",
-    "sem_KL_A_no = []\n",
-    "all_prob_y = []\n",
-    "\n",
-    "##############################################################################\n",
-    "## TODO for students: Fill in the missing parts (...)\n",
-    "## Fill in the missing parts to complete the function and remove\n",
-    "raise NotImplementedError(\"Student exercise\")\n",
-    "##############################################################################\n",
-    "\n",
-    "for y in tqdm(..., desc='Processing gammas'):\n",
-    "    Sigma = np.diag([1./np.sqrt(y)]*2)\n",
-    "    mean_KL_w = np.zeros((Nsubjects, 4))\n",
-    "    mean_KL_A = np.zeros((Nsubjects, 4))\n",
-    "    prob_y = np.zeros(Nsubjects)\n",
-    "\n",
-    "    for s in tqdm(range(Nsubjects), desc=f'Subjects for gamma={y}', leave=False):\n",
-    "        KL_w = np.zeros(len(cond))\n",
-    "        KL_A = np.zeros(len(cond))\n",
-    "        posteriorAware = np.zeros(len(cond))\n",
-    "\n",
-    "        # Generate sensory samples\n",
-    "        X = np.array([multivariate_normal.rvs(mean=mu[int(c)-1, :], cov=Sigma) for c in cond])\n",
-    "\n",
-    "        # Model inversion for each trial\n",
-    "        for i, x in enumerate(X):\n",
-    "            post_w, post_A, KL_w[i], KL_A[i] = HOSS_evaluate(x, mu, Sigma, Aprior, Wprior)\n",
-    "            posteriorAware[i] = post_A[1]  # Assuming post_A is a tuple with awareness probability at index 1\n",
-    "\n",
-    "        binaryAware = posteriorAware > 0.5\n",
-    "        for i in range(4):\n",
-    "            conditions = [(cond == 1), (cond != 1), (cond == 1), (cond != 1)]\n",
-    "            aware_conditions = [(binaryAware == 0), (binaryAware == 0), (binaryAware == 1), (binaryAware == 1)]\n",
-    "            mean_KL_w[s, i] = np.mean(KL_w[np.logical_and(aware_conditions[i], conditions[i])])\n",
-    "            mean_KL_A[s, i] = np.mean(KL_A[np.logical_and(aware_conditions[i], conditions[i])])\n",
-    "\n",
-    "        prob_y[s] = np.mean(binaryAware[cond != 1])\n",
-    "\n",
-    "    # Aggregate results across subjects\n",
-    "    all_KL_w_yes.append(np.nanmean(mean_KL_w[:, 2:4].flatten()))\n",
-    "    sem_KL_w_yes.append(np.nanstd(mean_KL_w[:, 2:4].flatten()) / np.sqrt(Nsubjects))\n",
-    "    all_KL_w_no.append(np.nanmean(mean_KL_w[:, :2].flatten()))\n",
-    "    sem_KL_w_no.append(np.nanstd(mean_KL_w[:, :2].flatten()) / np.sqrt(Nsubjects))\n",
-    "    all_KL_A_yes.append(np.nanmean(mean_KL_A[:, 2:4].flatten()))\n",
-    "    sem_KL_A_yes.append(np.nanstd(mean_KL_A[:, 2:4].flatten()) / np.sqrt(Nsubjects))\n",
-    "    all_KL_A_no.append(np.nanmean(mean_KL_A[:, :2].flatten()))\n",
-    "    sem_KL_A_no.append(np.nanstd(mean_KL_A[:, :2].flatten()) / np.sqrt(Nsubjects))\n",
-    "    all_prob_y.append(np.nanmean(prob_y))\n",
-    "\n",
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Create figure\n",
-    "    plt.figure(figsize=(10, 5))\n",
-    "\n",
-    "    # First subplot: Probability of reporting \"seen\" for w_1 or w_2\n",
-    "    plt.subplot(1, 3, 1)\n",
-    "    plt.plot(gamma, all_prob_y, linewidth=2)\n",
-    "    plt.xlabel('Stimulus strength')\n",
-    "    plt.ylabel('Prob. report \"seen\" for w_1 or w_2')\n",
-    "    plt.xticks(fontsize=14)\n",
-    "    plt.yticks(fontsize=14)\n",
-    "    plt.box(False)\n",
-    "\n",
-    "    # Second subplot: K-L divergence, perceptual states\n",
-    "    plt.subplot(1, 3, 2)\n",
-    "    plt.errorbar(gamma, all_KL_w_yes, yerr=sem_KL_w_yes, linewidth=2, label='Seen')\n",
-    "    plt.errorbar(gamma, all_KL_w_no, yerr=sem_KL_w_no, linewidth=2, label='Unseen')\n",
-    "    plt.legend(frameon=False)\n",
-    "    plt.xlabel('Stimulus strength')\n",
-    "    plt.ylabel('KL-divergence, perceptual states')\n",
-    "    plt.xticks(fontsize=14)\n",
-    "    plt.yticks(fontsize=14)\n",
-    "    plt.box(False)\n",
-    "\n",
-    "    # Third subplot: K-L divergence, awareness state\n",
-    "    plt.subplot(1, 3, 3)\n",
-    "    plt.errorbar(gamma, all_KL_A_yes, yerr=sem_KL_A_yes, linewidth=2, label='Seen')\n",
-    "    plt.errorbar(gamma, all_KL_A_no, yerr=sem_KL_A_no, linewidth=2, label='Unseen')\n",
-    "    plt.legend(frameon=False)\n",
-    "    plt.xlabel('Stimulus strength')\n",
-    "    plt.ylabel('KL-divergence, awareness state')\n",
-    "    plt.xticks(fontsize=14)\n",
-    "    plt.yticks(fontsize=14)\n",
-    "    plt.box(False)\n",
-    "\n",
-    "    # Adjust layout and display the figure\n",
-    "    plt.tight_layout()\n",
-    "    plt.show()\n",
+    "However, it's crucial to understand that in the HOSS model, these ignition-like responses don't directly cause the global sharing of information in the network. Rather, they are secondary effects or byproducts of other calculations happening within the network. Essentially, these bursts of activity are outcomes of deeper processes in the network, not the direct mechanisms for distributing information throughout the system.\n",
     "\n",
-    "```"
+    "</details>"
    ]
   },
   {
@@ -3472,8 +3374,6 @@
    },
    "outputs": [],
    "source": [
-    "# to_remove solution\n",
-    "\n",
     "# Experiment parameters\n",
     "mu = np.array([[0.5, 0.5], [3.5, 0.5], [0.5, 3.5]])\n",
     "Nsubjects = 30\n",
@@ -3724,9 +3624,9 @@
    },
    "source": [
     "---\n",
-    "# Summary\n",
+    "# The Big Picture\n",
     "\n",
-    "Hakwan will now discuss the critical aspects and limitations of current consciousness studies, addressing the challenges in distinguishing theories of consciousness from those merely describing general brain functions."
+    "Hakwan will now discuss the critical aspects and limitations of current consciousness studies, addressing the challenges in distinguishing theories of consciousness from those merely describing general brain functions. Join us in the next two videos where we wrap up some of the big ideas and try to put them in context for you!"
    ]
   },
   {
@@ -3884,743 +3784,9 @@
     "execution": {}
    },
    "source": [
-    "Below you'll find some optional coding & discussion bonus content!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f862cbc2-3222-484c-98cb-993f2b591b37",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "---\n",
-    "# Coding Bonus Section\n",
-    "This secton contains some extra coding exercises in case you have time and inclination."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "76dd7488-6558-4022-8541-22765f2967c6",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "## Bonus coding exersice 1: Train a first-order network\n",
-    "\n",
-    "This section invites you to engage with a straightforward, auto-generated dataset on blindsight, originally introduced by [Pasquali et al. in 2010](https://www.sciencedirect.com/science/article/abs/pii/S0010027710001794). Blindsight is a fascinating condition where individuals who are cortically blind due to damage in their primary visual cortex can still respond to visual stimuli without conscious perception. This intriguing phenomenon underscores the intricate nature of sensory processing and the brain's ability to process information without conscious awareness."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8c79b0a2-8e12-44ea-a685-bba788f6685d",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Visualize the autogenerated data\n",
-    "factor=2\n",
-    "initialize_global()\n",
-    "set_pre, _ = create_patterns(0,factor)\n",
-    "plot_signal_max_and_indicator(set_pre.detach().cpu(), \"Example - Pre training dataset\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cff70408-8662-43f5-b930-fc2a6ffca323",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "The pre-training dataset for the network consisted of 200 patterns. These were evenly divided: half were purely noise (with unit activations randomly chosen between 0.0 and 0.02), and the other half represented potential stimuli. In the stimulus patterns, 99 out of 100 units had activations ranging between 0.0 and 0.02, with one unique unit having an activation between 0.0 and 1.0."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0f45662a-08b4-44a4-89e1-200fc0c9cddb",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "**Testing patterns**\n",
-    "\n",
-    "As we have seen before, the network underwent evaluations under three distinct conditions, each modifying the signal-to-noise ratio in a unique way to explore different degrees and types of blindness.\n",
-    "\n",
-    "Suprathreshold stimulus condition: here, the network was exposed to the identical set of 200 patterns used during pre-training, testing the network's response to familiar inputs.\n",
-    "\n",
-    "Subthreshold stimulus condition (blindsight simulation): this condition aimed to mimic blindsight. It was achieved by introducing a slight noise increment (+0.0012) to every input of the first-order network, barring the one designated as the stimulus. This setup tested the network's ability to discern faint signals amidst noise.\n",
-    "\n",
-    "Low vision condition: to simulate low vision, the activation levels of the stimuli were reduced. Unlike the range from 0.0 to 1.0 used in pre-training, the stimuli's activation levels were adjusted to span from 0.0 to 0.3. This condition examined the network's capability to recognize stimuli with diminished intensity."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "db58d78b-17d8-4651-801a-f06e568a7322",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "factor=2\n",
-    "# Compare your results with the patterns generate below\n",
-    "set_1, _ = create_patterns(0,factor)\n",
-    "set_2, _ = create_patterns(1,factor)\n",
-    "set_3, _ = create_patterns(2,factor)\n",
-    "\n",
-    "# Plot\n",
-    "plot_signal_max_and_indicator(set_1.detach().cpu(), \"Suprathreshold dataset\")\n",
-    "plot_signal_max_and_indicator(set_2.detach().cpu(), \"Subthreshold dataset\")\n",
-    "plot_signal_max_and_indicator(set_3.detach().cpu(), \"Low Vision dataset\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cd5c13e0-75e8-45c2-b1be-70496041364b",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "### Activity 1: Building a network for a blindsight situation\n",
-    "\n",
-    "In this activity, we'll construct a neural network model using our auto-generated dataset, focusing on blindsight scenarios. The model will primarily consist of fully connected layers, establishing a straightforward, first-order network. The aim here is to assess the basic network's performance.\n",
-    "\n",
-    "**Steps to follow**\n",
-    "\n",
-    "1. Examine the network architecture: understand the structure of the neural network you're about to work with.\n",
-    "2. Visualize loss metrics: observe and analyze the network's performance during pre-training by visualizing the loss over epochs.\n",
-    "3. Evaluate the model: use the provided code snippets to calculate and interpret the model's accuracy, recall, and F1-score, giving you insight into the network's capabilities.\n",
-    "\n",
-    "**Understanding the process**\n",
-    "\n",
-    "The goal is to gain a thorough comprehension of the network's architecture and to interpret the pre-training results visually. This will provide a clearer picture of the model's potential and limitations.\n",
-    "\n",
-    "The network is designed as a backpropagation autoassociator. It features a 100-unit input layer, directly linked to a 40-unit hidden layer, which in turn connects to a 100-unit output layer. Initial connection weights are set within the range of -1.0 to 1.0 for the first-order network. To mitigate overfitting, dropout is employed within the network architecture. The architecture includes a configurable activation function. This flexibility allows for adjustments and tuning in Activity 3, aiming for optimal model performance."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "94d0bcaf-8b49-4e35-b0d2-1b9dcc98b182",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "class FirstOrderNetwork(nn.Module):\n",
-    "    def __init__(self, hidden_units, data_factor, use_gelu):\n",
-    "        \"\"\"\n",
-    "        Initializes the FirstOrderNetwork with specific configurations.\n",
-    "\n",
-    "        Parameters:\n",
-    "        - hidden_units (int): The number of units in the hidden layer.\n",
-    "        - data_factor (int): Factor to scale the amount of data processed.\n",
-    "                             A factor of 1 indicates the default data amount,\n",
-    "                             while 10 indicates 10 times the default amount.\n",
-    "        - use_gelu (bool): Flag to use GELU (True) or ReLU (False) as the activation function.\n",
-    "        \"\"\"\n",
-    "        super(FirstOrderNetwork, self).__init__()\n",
-    "\n",
-    "        # Define the encoder, hidden, and decoder layers with specified units\n",
-    "\n",
-    "        self.fc1 = nn.Linear(100, hidden_units, bias = False) # Encoder\n",
-    "        self.hidden= nn.Linear(hidden_units, hidden_units, bias = False) # Hidden\n",
-    "        self.fc2 = nn.Linear(hidden_units, 100, bias = False) # Decoder\n",
-    "\n",
-    "        self.relu = nn.ReLU()\n",
-    "        self.sigmoid = nn.Sigmoid()\n",
+    "Tutorial 3 today contains some bonus material based on extensions of what we've covered today. There, you'll find some optional coding & discussion bonus content! Feel free to bookmark and come back to it whenever you are ready. \n",
     "\n",
-    "\n",
-    "        # Dropout layer to prevent overfitting\n",
-    "        self.dropout = nn.Dropout(0.1)\n",
-    "\n",
-    "        # Set the data factor\n",
-    "        self.data_factor = data_factor\n",
-    "\n",
-    "        # Other activation functions for various purposes\n",
-    "        self.softmax = nn.Softmax()\n",
-    "\n",
-    "        # Initialize network weights\n",
-    "        self.initialize_weights()\n",
-    "\n",
-    "    def initialize_weights(self):\n",
-    "        \"\"\"Initializes weights of the encoder, hidden, and decoder layers uniformly.\"\"\"\n",
-    "        init.uniform_(self.fc1.weight, -1.0, 1.0)\n",
-    "\n",
-    "        init.uniform_(self.fc2.weight, -1.0, 1.0)\n",
-    "        init.uniform_(self.hidden.weight, -1.0, 1.0)\n",
-    "\n",
-    "    def encoder(self, x):\n",
-    "      h1 = self.dropout(self.relu(self.fc1(x.view(-1, 100))))\n",
-    "      return h1\n",
-    "\n",
-    "    def decoder(self,z):\n",
-    "      #h2 = self.relu(self.hidden(z))\n",
-    "      h2 = self.sigmoid(self.fc2(z))\n",
-    "      return h2\n",
-    "\n",
-    "\n",
-    "    def forward(self, x):\n",
-    "      \"\"\"\n",
-    "      Defines the forward pass through the network.\n",
-    "\n",
-    "      Parameters:\n",
-    "      - x (Tensor): The input tensor to the network.\n",
-    "\n",
-    "      Returns:\n",
-    "      - Tensor: The output of the network after passing through the layers and activations.\n",
-    "      \"\"\"\n",
-    "      h1 = self.encoder(x)\n",
-    "      h2 = self.decoder(h1)\n",
-    "\n",
-    "      return h1 , h2"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "83e07f1a-540b-4bfa-8e9b-f114319f1f96",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "For now, we will train the first order network only."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7bfade3d-6385-459c-8f07-e3017264455a",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
-    "# Hyperparameters\n",
-    "\n",
-    "# Hyperparameters\n",
-    "global optimizer ,n_epochs , learning_rate_1\n",
-    "learning_rate_1 = 0.5\n",
-    "n_epochs = 100\n",
-    "optimizer=\"ADAMAX\"\n",
-    "hidden=40\n",
-    "factor=2\n",
-    "gelu=False\n",
-    "gam=0.98\n",
-    "meta=True\n",
-    "stepsize=25\n",
-    "initialize_global()\n",
-    "\n",
-    "\n",
-    "# Networks instantiation\n",
-    "first_order_network = FirstOrderNetwork(hidden,factor,gelu).to(device)\n",
-    "second_order_network = SecondOrderNetwork(gelu).to(device) # We define it, but won't use it until activity 3\n",
-    "\n",
-    "# Loss function\n",
-    "criterion_1 = CAE_loss\n",
-    "\n",
-    "# Optimizer\n",
-    "optimizer_1 = optim.Adamax(first_order_network.parameters(), lr=learning_rate_1)\n",
-    "\n",
-    "# Learning rate schedulers\n",
-    "scheduler_1 = StepLR(optimizer_1, step_size=stepsize, gamma=gam)\n",
-    "\n",
-    "max_values_output_first_order = []\n",
-    "max_indices_output_first_order = []\n",
-    "max_values_patterns_tensor = []\n",
-    "max_indices_patterns_tensor = []\n",
-    "\n",
-    "# Training loop\n",
-    "for epoch in range(n_epochs):\n",
-    "    # Generate training patterns and targets for each epoch.\n",
-    "    patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor = generate_patterns(patterns_number, num_units,factor, 0)\n",
-    "\n",
-    "    # Forward pass through the first-order network\n",
-    "    hidden_representation , output_first_order = first_order_network(patterns_tensor)\n",
-    "\n",
-    "    output_first_order=output_first_order.requires_grad_(True)\n",
-    "\n",
-    "    # Skip computations for the second-order network\n",
-    "    with torch.no_grad():\n",
-    "\n",
-    "        # Potentially forward pass through the second-order network without tracking gradients\n",
-    "        output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
-    "\n",
-    "    # Calculate the loss for the first-order network (accuracy of stimulus representation)\n",
-    "    W = first_order_network.state_dict()['fc1.weight']\n",
-    "    loss_1 = criterion_1( W, stim_present_tensor.view(-1, 100), output_first_order,\n",
-    "                        hidden_representation, lam )\n",
-    "    # Backpropagate the first-order network's loss\n",
-    "    loss_1.backward()\n",
-    "\n",
-    "    # Update first-order network weights\n",
-    "    optimizer_1.step()\n",
-    "\n",
-    "    # Reset first-order optimizer gradients to zero for the next iteration\n",
-    "\n",
-    "    # Update the first-order scheduler\n",
-    "    scheduler_1.step()\n",
-    "\n",
-    "    epoch_1_order[epoch] = loss_1.item()\n",
-    "\n",
-    "    # Get max values and indices for output_first_order\n",
-    "    max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
-    "    max_inds_out[max_vals_out == 0] = 0\n",
-    "    max_values_output_first_order.append(max_vals_out.tolist())\n",
-    "    max_indices_output_first_order.append(max_inds_out.tolist())\n",
-    "\n",
-    "    # Get max values and indices for patterns_tensor\n",
-    "    max_vals_pat, max_inds_pat = torch.max(patterns_tensor[100:], dim=1)\n",
-    "    max_inds_pat[max_vals_pat == 0] = 0\n",
-    "    max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
-    "    max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
-    "\n",
-    "\n",
-    "max_values_indices = (max_values_output_first_order[-1],\n",
-    "            max_indices_output_first_order[-1],\n",
-    "            max_values_patterns_tensor[-1],\n",
-    "            max_indices_patterns_tensor[-1])\n",
-    "\n",
-    "\n",
-    "# Plot training loss curve\n",
-    "pre_train_plots(epoch_1_order, epoch_2_order, \"1st & 2nd Order Networks\" , max_value_indices )"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fa7d1ce9-da1b-4f78-b388-47bfcd50c6dd",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "### Testing under 3 blindsight conditions\n",
-    "\n",
-    "We will now use the testing auto-generated datasets from activity 1 to test the network's performance."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2affe162-f4d9-495f-862a-65b0f50ca5ef",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "results_seed=[]\n",
-    "discrimination_seed=[]\n",
-    "\n",
-    "# Prepare networks for testing by calling the configuration function\n",
-    "testing_patterns, n_samples, loaded_model, loaded_model_2 = config_training(first_order_network, second_order_network, hidden, factor, gelu)\n",
-    "\n",
-    "# Perform testing using the defined function and plot the results\n",
-    "f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, loaded_model, loaded_model_2,factor)\n",
-    "\n",
-    "results_seed.append(results_for_plotting)\n",
-    "discrimination_seed.append(discrimination_performances)\n",
-    "# Assuming plot_testing is defined, call it to display results\n",
-    "plot_testing(results_seed,discrimination_seed,  1, \"Seed\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0d18302e-4657-4732-b6ef-f7439d2bb2fd",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_First_order_network\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8a694f1e-3f32-48fc-bce8-0b544d43ca62",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "---\n",
-    "## Bonus coding section 2: Plot surfaces for content / awareness inferences\n",
-    "\n",
-    "To explore the properties of the HOSS model, we can simulate inference at different levels of the hierarchy over the full 2D space of possible input X's. The left panel below shows that the probability of awareness (of any stimulus contents) rises in a graded manner from the lower left corner of the graph (low activation of any feature) to the upper right (high activation of both features). In contrast, the right panel shows that confidence in making a discrimination response (e.g. rightward vs. leftward) increases away from the major diagonal, as the model becomes sure that the sample was generated by either a leftward or rightward tilted stimulus.\n",
-    "\n",
-    "Together, the two surfaces make predictions about the relationships we might see between discrimination confidence and awareness in a simple psychophysics experiment. One notable prediction is that discrimination could still be possible - and lead to some degree of confidence - even when the higher-order node is \"reporting\" unawareness of the stimulus.\n",
-    "\n",
-    "Now, let's get hands on and plot those auto-generated patterns!\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "31503073-a7c0-4502-8d94-5ffa47a22926",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Define the grid\n",
-    "xgrid = np.arange(0, 2.01, 0.01)\n",
-    "\n",
-    "# Define the means for the Gaussian distributions\n",
-    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
-    "\n",
-    "# Define the covariance matrix\n",
-    "Sigma = np.array([[1, 0], [0, 1]])\n",
-    "\n",
-    "# Prior probabilities\n",
-    "Wprior = np.array([0.5, 0.5])\n",
-    "Aprior = 0.5\n",
-    "\n",
-    "# Initialize arrays to hold confidence and posterior probability\n",
-    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
-    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
-    "\n",
-    "# Compute confidence and posterior probability for each point in the grid\n",
-    "for i, xi in tqdm(enumerate(xgrid), total=len(xgrid), desc='Outer Loop'):\n",
-    "    for j, xj in enumerate(xgrid):\n",
-    "        X = [xi, xj]\n",
-    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
-    "        confW[i, j] = max(post_w[1], post_w[2])\n",
-    "        posteriorAware[i, j] = post_A[1]\n",
-    "\n",
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Plotting\n",
-    "    plt.figure(figsize=(10, 5))\n",
-    "\n",
-    "    # Posterior probability \"seen\"\n",
-    "    plt.subplot(1, 2, 1)\n",
-    "    plt.contourf(xgrid, xgrid, posteriorAware.T)\n",
-    "    plt.colorbar()\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('Posterior probability \"seen\"')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    # Confidence in identity\n",
-    "    plt.subplot(1, 2, 2)\n",
-    "    contour_set = plt.contourf(xgrid, xgrid, confW.T)\n",
-    "    plt.colorbar()\n",
-    "    plt.contour(xgrid, xgrid, posteriorAware.T, levels=[0.5], linewidths=4, colors=['white'])  # Line contour for threshold\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('Confidence in identity')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2d129657-62aa-42d1-970a-93fd67736b69",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "**Simulate KL-divergence surfaces**\n",
-    "\n",
-    "We can also simulate KL-divergences (a measure of Bayesian surprise) at each layer in the network, which under predictive coding models of the brain, has been proposed to scale with neural activation (e.g., Friston, 2005; Summerfield & de Lange, 2014)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "66044263-c8de-49a9-a56b-2e7336cc737c",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Define the grid\n",
-    "xgrid = np.arange(0, 2.01, 0.01)\n",
-    "\n",
-    "# Define the means for the Gaussian distributions\n",
-    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
-    "\n",
-    "# Define the covariance matrix\n",
-    "Sigma = np.array([[1, 0], [0, 1]])\n",
-    "\n",
-    "# Prior probabilities\n",
-    "Wprior = np.array([0.5, 0.5])\n",
-    "Aprior = 0.5\n",
-    "\n",
-    "# Initialize arrays to hold confidence and posterior probability\n",
-    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
-    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
-    "\n",
-    "# Compute confidence and posterior probability for each point in the grid\n",
-    "for i, xi in enumerate(xgrid):\n",
-    "    for j, xj in enumerate(xgrid):\n",
-    "        X = [xi, xj]\n",
-    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
-    "\n",
-    "        confW[i, j] = max(post_w[1], post_w[2])\n",
-    "        posteriorAware[i, j] = post_A[1]\n",
-    "\n",
-    "# Calculate the mean K-L divergence for absent and present awareness states\n",
-    "KL_A_absent = np.mean(KL_A[posteriorAware < 0.5])\n",
-    "KL_A_present = np.mean(KL_A[posteriorAware >= 0.5])\n",
-    "KL_w_absent = np.mean(KL_w[posteriorAware < 0.5])\n",
-    "KL_w_present = np.mean(KL_w[posteriorAware >= 0.5])\n",
-    "\n",
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Plotting\n",
-    "    plt.figure(figsize=(18, 6))\n",
-    "\n",
-    "    # K-L divergence, perceptual states\n",
-    "    plt.subplot(1, 2, 1)\n",
-    "    plt.contourf(xgrid, xgrid, KL_w.T, cmap='viridis')\n",
-    "    plt.colorbar()\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('KL-divergence, perceptual states')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    # K-L divergence, awareness state\n",
-    "    plt.subplot(1, 2, 2)\n",
-    "    plt.contourf(xgrid, xgrid, KL_A.T, cmap='viridis')\n",
-    "    plt.colorbar()\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('KL-divergence, awareness state')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b32e4908-0f6f-4259-832f-045adcb19700",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "### Discussion point\n",
-    "\n",
-    "Can you recognise the difference between the KL divergence for the W-level and the one for the A-level?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7d8deb66-9a1d-49e1-a3ef-96970efa8d97",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# to_remove explanation\n",
-    "\"\"\"\n",
-    "At the level of perceptual states W, there is a substantial asymmetry in the KL-divergence expected when the\n",
-    "model says ‘seen’ vs. ‘unseen’ (lefthand panel). This is due to the large belief updates invoked in the\n",
-    "perceptual layer W by samples that deviate from the lower lefthand corner - from absence. In contrast, when\n",
-    "we compute KL-divergence for the A-level (righthand panel), the level of prediction error is symmetric across\n",
-    "seen and unseen decisions, leading to \"hot\" zones both at the upper righthand (present) and lower lefthand\n",
-    "(absent) corners of the 2D space.\n",
-    "\n",
-    "Intuitively, this means that at the W-level, there's a noticeable difference in the KL-divergence values\n",
-    "between \"seen\" and \"unseen\" predictions. This large difference is mainly due to significant updates in the\n",
-    "model's beliefs at this level when the detected samples are far from what is expected under the condition of\n",
-    "\"absence.\" However, when we analyze the K-L divergence at the A-level, the discrepancies in prediction errors\n",
-    "between \"seen\" and \"unseen\" are balanced. This creates equally strong responses in the model, whether something\n",
-    "is detected or not detected.\n",
-    "\n",
-    "We can also sort the KL-divergences as a function of whether the model \"reported\" presence or absence. As\n",
-    "can be seen in the bar plots below, there is more asymmetry in the prediction error at the W compared to the\n",
-    "A levels.\n",
-    "\n",
-    "\"\"\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "869fc8f1-4199-4525-80b3-26e74babc66a",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Create figure with specified size\n",
-    "    plt.figure(figsize=(10, 5))\n",
-    "\n",
-    "    # KL divergence for W states\n",
-    "    plt.subplot(1, 2, 1)\n",
-    "    plt.bar(['unseen', 'seen'], [KL_w_absent, KL_w_present], color='k')\n",
-    "    plt.ylabel('KL divergence, W states')\n",
-    "    plt.xticks(fontsize=18)\n",
-    "    plt.yticks(fontsize=18)\n",
-    "\n",
-    "    # KL divergence for A states\n",
-    "    plt.subplot(1, 2, 2)\n",
-    "    plt.bar(['unseen', 'seen'], [KL_A_absent, KL_A_present], color='k')\n",
-    "    plt.ylabel('KL divergence, A states')\n",
-    "    plt.xticks(fontsize=18)\n",
-    "    plt.yticks(fontsize=18)\n",
-    "\n",
-    "    plt.tight_layout()\n",
-    "\n",
-    "    # Show plot\n",
-    "    plt.show()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "64ecb92c-bfe3-4e49-bd40-f11ffa685ece",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_HOSS_Bonus_Content\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bcd87344-d473-44af-a881-b68e5471d353",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "---\n",
-    "# Discussion Bonus Section\n",
-    "This section contains an extra discussion exercise if you have time and inclination."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ca33829c-8d54-437e-ba33-d3003af51d7a",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "In this bonus section, Megan and Anil will delve into the complexities of defining and testing for consciousness, particularly in the context of artificial intelligence. We will explore various theoretical perspectives, examine classic and contemporary tests for consciousness, and discuss the challenges and ethical implications of determining whether a system truly possesses conscious experience."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "21b621ea-1639-4131-8ec3-9cdf34a64f77",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Video 11: Consciousness Bonus Content\n",
-    "\n",
-    "from ipywidgets import widgets\n",
-    "from IPython.display import YouTubeVideo\n",
-    "from IPython.display import IFrame\n",
-    "from IPython.display import display\n",
-    "\n",
-    "class PlayVideo(IFrame):\n",
-    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
-    "    self.id = id\n",
-    "    if source == 'Bilibili':\n",
-    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
-    "    elif source == 'Osf':\n",
-    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
-    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
-    "\n",
-    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
-    "  tab_contents = []\n",
-    "  for i, video_id in enumerate(video_ids):\n",
-    "    out = widgets.Output()\n",
-    "    with out:\n",
-    "      if video_ids[i][0] == 'Youtube':\n",
-    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
-    "                             height=H, fs=fs, rel=0)\n",
-    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
-    "      else:\n",
-    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
-    "                          height=H, fs=fs, autoplay=False)\n",
-    "        if video_ids[i][0] == 'Bilibili':\n",
-    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
-    "        elif video_ids[i][0] == 'Osf':\n",
-    "          print(f'Video available at https://osf.io/{video.id}')\n",
-    "      display(video)\n",
-    "    tab_contents.append(out)\n",
-    "  return tab_contents\n",
-    "\n",
-    "video_ids = [('Youtube', '00dL8q7WgcU'), ('Bilibili', 'BV12n4y1Q7C2')]\n",
-    "tab_contents = display_videos(video_ids, W=854, H=480)\n",
-    "tabs = widgets.Tab()\n",
-    "tabs.children = tab_contents\n",
-    "for i in range(len(tab_contents)):\n",
-    "  tabs.set_title(i, video_ids[i][0])\n",
-    "display(tabs)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "39c202fb-f580-4a96-8f8e-bad24ed1d55c",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_Video_11\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e9e839f2-e237-4ed4-9045-56dc7b5f6d60",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "## Discussion activity: Is it actually conscious?"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2720c0b5-6386-43a6-9647-f1245531c376",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "We discussed the difference between these two...\n",
-    "- \"Forward\" tests: passing means the machine is conscious (or intelligent).\n",
-    "- \"Reverse\" tests: passing means humans are convinced that a machine is conscious (or intelligent).\n",
-    "\n",
-    "**Discuss!** If a system (AI, other animal, other human) exhibited all the \"right signs\" of being conscious, how can we know for sure it is actually conscious? How could you design a test to be a true forward test?\n",
-    "\n",
-    "- Room 1: I think you could design a forward test in this way... [share your ideas]\n",
-    "- Room 2: I think a forward test is impossible, and here's why [share your ideas]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "84958157-c165-4cc3-be76-408999cf44ad",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_Discussion_activity\")"
+    "For the moment, let's switch to our second topic on the day, arguably one of the most important topics we've covered so far: Ethics."
    ]
   }
  ],
@@ -4652,7 +3818,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.22"
   }
  },
  "nbformat": 4,
diff --git a/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial2.ipynb b/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial2.ipynb
index ad486f06a..099132da9 100644
--- a/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial2.ipynb
+++ b/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial2.ipynb
@@ -25,9 +25,9 @@
     "\n",
     "__Content creators:__ Megan Peters, Joshua Shepherd, Jana Schaich Borg\n",
     "\n",
-    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang\n",
+    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Alex Murphy\n",
     "\n",
-    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk\n"
+    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Alex Murphy\n"
    ]
   },
   {
@@ -542,7 +542,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.22"
   }
  },
  "nbformat": 4,
diff --git a/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial3.ipynb b/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial3.ipynb
new file mode 100644
index 000000000..b6e3b41e7
--- /dev/null
+++ b/tutorials/W2D5_Mysteries/instructor/W2D5_Tutorial3.ipynb
@@ -0,0 +1,2406 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "89a00b06-154b-4aaf-8bee-b96a675406b5",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a> &nbsp; <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb\"  target=\"_parent\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open in Kaggle\"/></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82ed61a3-87d2-4e76-83f6-4b786c101af2",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "# (Bonus) Tutorial 3: Consciousness (Extended)\n",
+    "\n",
+    "**Week 2, Day 5: Mysteries**\n",
+    "\n",
+    "**By Neuromatch Academy**\n",
+    "\n",
+    "__Content creators:__ Steve Fleming, Guillaume Dumas, Samuele Bolotta, Juan David Vargas, Hakwan Lau, Anil Seth, Megan Peters\n",
+    "\n",
+    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Patrick Mineault, Alex Murphy\n",
+    "\n",
+    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Patrick Mineault, Alex Murphy\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7861818a",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Install and import feedback gadget\n",
+    "\n",
+    "!pip install vibecheck numpy matplotlib Pillow torch torchvision transformers ipywidgets gradio trdg scikit-learn networkx pickleshare seaborn tabulate --quiet\n",
+    "\n",
+    "from vibecheck import DatatopsContentReviewContainer\n",
+    "def content_review(notebook_section: str):\n",
+    "    return DatatopsContentReviewContainer(\n",
+    "        \"\",  # No text prompt\n",
+    "        notebook_section,\n",
+    "        {\n",
+    "            \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
+    "            \"name\": \"neuromatch_neuroai\",\n",
+    "            \"user_key\": \"wb2cxze8\",\n",
+    "        },\n",
+    "    ).render()\n",
+    "\n",
+    "feedback_prefix = \"W2D5_T3\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c4e3a7d",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Import dependencies\n",
+    "# @markdown\n",
+    "\n",
+    "import contextlib\n",
+    "import io\n",
+    "\n",
+    "with contextlib.redirect_stdout(io.StringIO()):\n",
+    "    # Standard Libraries\n",
+    "    import copy\n",
+    "    import logging\n",
+    "    import os\n",
+    "    import random\n",
+    "    import requests\n",
+    "\n",
+    "    # Data Handling and Visualization Libraries\n",
+    "    import numpy as np\n",
+    "    import pandas as pd\n",
+    "    import matplotlib.pyplot as plt\n",
+    "    import seaborn as sns\n",
+    "    from sklearn.metrics import precision_score, recall_score, fbeta_score\n",
+    "    from sklearn.linear_model import LinearRegression\n",
+    "    from tabulate import tabulate\n",
+    "\n",
+    "    # Scientific Computing and Statistical Libraries\n",
+    "    from numpy.linalg import inv\n",
+    "    from scipy.special import logsumexp\n",
+    "    from scipy.stats import multivariate_normal\n",
+    "\n",
+    "    # Deep Learning Libraries\n",
+    "    import torch\n",
+    "    from torch import nn, optim, save, load\n",
+    "    from torch.nn import functional as F\n",
+    "    from torch.utils.data import DataLoader\n",
+    "    import torch.nn.init as init\n",
+    "    from torch.optim.lr_scheduler import StepLR\n",
+    "\n",
+    "    # Image Processing Libraries\n",
+    "    from PIL import Image\n",
+    "    from matplotlib.patches import Patch\n",
+    "    from mpl_toolkits.mplot3d import Axes3D\n",
+    "\n",
+    "    # Interactive Elements and Web Applications\n",
+    "    from IPython.display import IFrame\n",
+    "    from IPython.display import Image as IMG\n",
+    "    import gradio as gr\n",
+    "    import ipywidgets as widgets\n",
+    "    from ipywidgets import interact, IntSlider\n",
+    "\n",
+    "    # Graph Analysis Libraries\n",
+    "    import networkx as nx\n",
+    "\n",
+    "    # Progress Monitoring Libraries\n",
+    "    from tqdm import tqdm\n",
+    "\n",
+    "    # Utilities and Miscellaneous Libraries\n",
+    "    from itertools import product\n",
+    "\n",
+    "    import math\n",
+    "    !pip install torch_optimizer\n",
+    "    import torch_optimizer as optim2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "00f889a6",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Figure settings\n",
+    "# @markdown\n",
+    "\n",
+    "logging.getLogger('matplotlib.font_manager').disabled = True\n",
+    "\n",
+    "%matplotlib inline\n",
+    "%config InlineBackend.figure_format = 'retina' # perfrom high definition rendering for images and plots\n",
+    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "98ca7c55",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Set device (GPU or CPU)\n",
+    "\n",
+    "def set_device():\n",
+    "    \"\"\"\n",
+    "    Determines and sets the computational device for PyTorch operations based on the availability of a CUDA-capable GPU.\n",
+    "\n",
+    "    Outputs:\n",
+    "    - device (str): The device that PyTorch will use for computations ('cuda' or 'cpu'). This string can be directly used\n",
+    "    in PyTorch operations to specify the device.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
+    "    if device != \"cuda\":\n",
+    "        print(\"GPU is not enabled in this notebook. \\n\"\n",
+    "              \"If you want to enable it, in the menu under `Runtime` -> \\n\"\n",
+    "              \"`Hardware accelerator.` and select `GPU` from the dropdown menu\")\n",
+    "    else:\n",
+    "        print(\"GPU is enabled in this notebook. \\n\"\n",
+    "              \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n",
+    "              \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n",
+    "\n",
+    "    return device\n",
+    "\n",
+    "device = set_device()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2508d8b4",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Helper functions\n",
+    "\n",
+    "mse_loss = nn.BCELoss(size_average = False)\n",
+    "\n",
+    "lam = 1e-4\n",
+    "\n",
+    "from torch.autograd import Variable\n",
+    "\n",
+    "def CAE_loss(W, x, recons_x, h, lam):\n",
+    "    \"\"\"Compute the Contractive AutoEncoder Loss\n",
+    "\n",
+    "    Evalutes the CAE loss, which is composed as the summation of a Mean\n",
+    "    Squared Error and the weighted l2-norm of the Jacobian of the hidden\n",
+    "    units with respect to the inputs.\n",
+    "\n",
+    "\n",
+    "    See reference below for an in-depth discussion:\n",
+    "      #1: http://wiseodd.github.io/techblog/2016/12/05/contractive-autoencoder\n",
+    "\n",
+    "    Args:\n",
+    "        `W` (FloatTensor): (N_hidden x N), where N_hidden and N are the\n",
+    "          dimensions of the hidden units and input respectively.\n",
+    "        `x` (Variable): the input to the network, with dims (N_batch x N)\n",
+    "        recons_x (Variable): the reconstruction of the input, with dims\n",
+    "          N_batch x N.\n",
+    "        `h` (Variable): the hidden units of the network, with dims\n",
+    "          batch_size x N_hidden\n",
+    "        `lam` (float): the weight given to the jacobian regulariser term\n",
+    "\n",
+    "    Returns:\n",
+    "        Variable: the (scalar) CAE loss\n",
+    "    \"\"\"\n",
+    "    mse = mse_loss(recons_x, x)\n",
+    "    # Since: W is shape of N_hidden x N. So, we do not need to transpose it as\n",
+    "    # opposed to #1\n",
+    "    dh = h * (1 - h) # Hadamard product produces size N_batch x N_hidden\n",
+    "    # Sum through the input dimension to improve efficiency, as suggested in #1\n",
+    "    w_sum = torch.sum(Variable(W)**2, dim=1)\n",
+    "    # unsqueeze to avoid issues with torch.mv\n",
+    "    w_sum = w_sum.unsqueeze(1) # shape N_hidden x 1\n",
+    "    contractive_loss = torch.sum(torch.mm(dh**2, w_sum), 0)\n",
+    "    return mse + contractive_loss.mul_(lam)\n",
+    "\n",
+    "class FirstOrderNetwork(nn.Module):\n",
+    "    def __init__(self, hidden_units, data_factor, use_gelu):\n",
+    "        \"\"\"\n",
+    "        Initializes the FirstOrderNetwork with specific configurations.\n",
+    "\n",
+    "        Parameters:\n",
+    "        - hidden_units (int): The number of units in the hidden layer.\n",
+    "        - data_factor (int): Factor to scale the amount of data processed.\n",
+    "                             A factor of 1 indicates the default data amount,\n",
+    "                             while 10 indicates 10 times the default amount.\n",
+    "        - use_gelu (bool): Flag to use GELU (True) or ReLU (False) as the activation function.\n",
+    "        \"\"\"\n",
+    "        super(FirstOrderNetwork, self).__init__()\n",
+    "\n",
+    "        # Define the encoder, hidden, and decoder layers with specified units\n",
+    "\n",
+    "        self.fc1 = nn.Linear(100, hidden_units, bias = False) # Encoder\n",
+    "        self.hidden= nn.Linear(hidden_units, hidden_units, bias = False) # Hidden\n",
+    "        self.fc2 = nn.Linear(hidden_units, 100, bias = False) # Decoder\n",
+    "\n",
+    "        self.relu = nn.ReLU()\n",
+    "        self.sigmoid = nn.Sigmoid()\n",
+    "\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting\n",
+    "        self.dropout = nn.Dropout(0.1)\n",
+    "\n",
+    "        # Set the data factor\n",
+    "        self.data_factor = data_factor\n",
+    "\n",
+    "        # Other activation functions for various purposes\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize network weights\n",
+    "        self.initialize_weights()\n",
+    "\n",
+    "    def initialize_weights(self):\n",
+    "        \"\"\"Initializes weights of the encoder, hidden, and decoder layers uniformly.\"\"\"\n",
+    "        init.uniform_(self.fc1.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.fc2.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.hidden.weight, -1.0, 1.0)\n",
+    "\n",
+    "    def encoder(self, x):\n",
+    "      h1 = self.dropout(self.relu(self.fc1(x.view(-1, 100))))\n",
+    "      return h1\n",
+    "\n",
+    "    def decoder(self,z):\n",
+    "      #h2 = self.relu(self.hidden(z))\n",
+    "      h2 = self.sigmoid(self.fc2(z))\n",
+    "      return h2\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "      \"\"\"\n",
+    "      Defines the forward pass through the network.\n",
+    "\n",
+    "      Parameters:\n",
+    "      - x (Tensor): The input tensor to the network.\n",
+    "\n",
+    "      Returns:\n",
+    "      - Tensor: The output of the network after passing through the layers and activations.\n",
+    "      \"\"\"\n",
+    "      h1 = self.encoder(x)\n",
+    "      h2 = self.decoder(h1)\n",
+    "\n",
+    "      return h1 , h2\n",
+    "\n",
+    "class SecondOrderNetwork(nn.Module):\n",
+    "    def __init__(self, use_gelu):\n",
+    "        super(SecondOrderNetwork, self).__init__()\n",
+    "        # Define a linear layer for comparing the difference between input and output of the first-order network\n",
+    "        self.comparison_layer = nn.Linear(100, 100)\n",
+    "\n",
+    "        # Linear layer for determining wagers, mapping from 100 features to a single output\n",
+    "        self.wager = nn.Linear(100, 1)\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting by randomly setting input units to 0 with a probability of 0.5 during training\n",
+    "        self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "        # Select activation function based on the `use_gelu` flag\n",
+    "        self.activation = torch.relu\n",
+    "\n",
+    "        # Additional activation functions for potential use in network operations\n",
+    "        self.sigmoid = torch.sigmoid\n",
+    "\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize the weights of the network\n",
+    "        self._init_weights()\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def forward(self, first_order_input, first_order_output):\n",
+    "        # Calculate the difference between the first-order input and output\n",
+    "        comparison_matrix = first_order_input - first_order_output\n",
+    "\n",
+    "        #Another option is to directly calculate the per unit MSE to use as input for the comparator matrix\n",
+    "        #comparison_matrix = nn.MSELoss(reduction='none')(first_order_output, first_order_input)\n",
+    "\n",
+    "        # Pass the difference through the comparison layer and apply the chosen activation function\n",
+    "        comparison_out=self.dropout(self.activation(self.comparison_layer(comparison_matrix)))\n",
+    "\n",
+    "        # Calculate the wager value, applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        wager = self.sigmoid(self.wager(comparison_out))\n",
+    "\n",
+    "        return wager\n",
+    "\n",
+    "def initialize_global():\n",
+    "    global Input_Size_1, Hidden_Size_1, Output_Size_1, Input_Size_2\n",
+    "    global num_units, patterns_number\n",
+    "    global learning_rate_2, momentum, temperature , Threshold\n",
+    "    global First_set, Second_set, Third_set\n",
+    "    global First_set_targets, Second_set_targets, Third_set_targets\n",
+    "    global epoch_list, epoch_1_order, epoch_2_order, patterns_matrix1\n",
+    "    global testing_graph_names\n",
+    "\n",
+    "    global optimizer ,n_epochs , learning_rate_1\n",
+    "    learning_rate_1 = 0.5\n",
+    "    n_epochs = 100\n",
+    "    optimizer=\"ADAMAX\"\n",
+    "\n",
+    "    # Network sizes\n",
+    "    Input_Size_1 = 100\n",
+    "    Hidden_Size_1 = 60\n",
+    "    Output_Size_1 = 100\n",
+    "    Input_Size_2 = 100\n",
+    "\n",
+    "    # Patterns\n",
+    "    num_units = 100\n",
+    "    patterns_number = 200\n",
+    "\n",
+    "    # Pre-training and hyperparameters\n",
+    "    learning_rate_2 = 0.1\n",
+    "    momentum = 0.9\n",
+    "    temperature = 1.0\n",
+    "    Threshold=0.5\n",
+    "\n",
+    "    # Testing\n",
+    "    First_set = []\n",
+    "    Second_set = []\n",
+    "    Third_set = []\n",
+    "    First_set_targets = []\n",
+    "    Second_set_targets = []\n",
+    "    Third_set_targets = []\n",
+    "\n",
+    "    # Graphic of pretraining\n",
+    "    epoch_list = list(range(1, n_epochs + 1))\n",
+    "    epoch_1_order = np.zeros(n_epochs)\n",
+    "    epoch_2_order = np.zeros(n_epochs)\n",
+    "    patterns_matrix1 =  torch.zeros((n_epochs, patterns_number), device=device)  # Initialize patterns_matrix as a PyTorch tensor on the GPU\n",
+    "\n",
+    "def compute_metrics(TP, TN, FP, FN):\n",
+    "    \"\"\"Compute precision, recall, F1 score, and accuracy.\"\"\"\n",
+    "    precision = round(TP / (TP + FP), 2) if (TP + FP) > 0 else 0\n",
+    "    recall = round(TP / (TP + FN), 2) if (TP + FN) > 0 else 0\n",
+    "    f1_score = round(2 * (precision * recall) / (precision + recall), 2) if (precision + recall) > 0 else 0\n",
+    "    accuracy = round((TP + TN) / (TP + TN + FP + FN), 2) if (TP + TN + FP + FN) > 0 else 0\n",
+    "    return precision, recall, f1_score, accuracy\n",
+    "\n",
+    "# define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "def prepare_pre_training(hidden,factor,gelu,stepsize, gam):\n",
+    "\n",
+    "  first_order_network = FirstOrderNetwork(hidden, factor, gelu).to(device)\n",
+    "  second_order_network = SecondOrderNetwork(gelu).to(device)\n",
+    "\n",
+    "  criterion_1 = CAE_loss\n",
+    "  criterion_2 = nn.BCELoss(size_average = False)\n",
+    "\n",
+    "\n",
+    "  if optimizer == \"ADAM\":\n",
+    "    optimizer_1 = optim.Adam(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.Adam(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"SGD\":\n",
+    "    optimizer_1 = optim.SGD(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.SGD(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"SWATS\":\n",
+    "    optimizer_1 = optim2.SWATS(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim2.SWATS(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"ADAMW\":\n",
+    "    optimizer_1 = optim.AdamW(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.AdamW(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"RMS\":\n",
+    "    optimizer_1 = optim.RMSprop(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.RMSprop(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"ADAMAX\":\n",
+    "    optimizer_1 = optim.Adamax(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.Adamax(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  # Learning rate schedulers\n",
+    "  scheduler_1 = StepLR(optimizer_1, step_size=stepsize, gamma=gam)\n",
+    "  scheduler_2 = StepLR(optimizer_2, step_size=stepsize, gamma=gam)\n",
+    "\n",
+    "  return first_order_network, second_order_network, criterion_1 , criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2\n",
+    "\n",
+    "def title(string):\n",
+    "    # Enable XKCD plot styling\n",
+    "    with plt.xkcd():\n",
+    "        # Create a figure and an axes.\n",
+    "        fig, ax = plt.subplots()\n",
+    "\n",
+    "        # Create a rectangle patch with specified dimensions and styles\n",
+    "        rectangle = patches.Rectangle((0.05, 0.1), 0.9, 0.4, linewidth=1, edgecolor='r', facecolor='blue', alpha=0.5)\n",
+    "        ax.add_patch(rectangle)\n",
+    "\n",
+    "        # Place text inside the rectangle, centered\n",
+    "        plt.text(0.5, 0.3, string, horizontalalignment='center', verticalalignment='center', fontsize=26, color='white')\n",
+    "\n",
+    "        # Set plot limits\n",
+    "        ax.set_xlim(0, 1)\n",
+    "        ax.set_ylim(0, 1)\n",
+    "\n",
+    "        # Disable axis display\n",
+    "        ax.axis('off')\n",
+    "\n",
+    "        # Display the plot\n",
+    "        plt.show()\n",
+    "\n",
+    "        # Close the figure to free up memory\n",
+    "        plt.close(fig)\n",
+    "\n",
+    "# Function to configure the training environment and load the models\n",
+    "def get_test_patterns(factor):\n",
+    "    \"\"\"\n",
+    "    Configures the training environment by saving the state of the given models and loading them back.\n",
+    "    Initializes testing patterns for evaluation.\n",
+    "\n",
+    "    Returns:\n",
+    "    - Tuple of testing patterns, number of samples in the testing patterns\n",
+    "    \"\"\"\n",
+    "    # Generating testing patterns for three different sets\n",
+    "    first_set, first_set_targets = create_patterns(0,factor)\n",
+    "    second_set, second_set_targets = create_patterns(1,factor)\n",
+    "    third_set, third_set_targets = create_patterns(2,factor)\n",
+    "\n",
+    "    # Aggregate testing patterns and their targets for ease of access\n",
+    "    testing_patterns = [[first_set, first_set_targets], [second_set, second_set_targets], [third_set, third_set_targets]]\n",
+    "\n",
+    "    # Determine the number of samples from the first set (assumed consistent across all sets)\n",
+    "    n_samples = len(testing_patterns[0][0])\n",
+    "\n",
+    "    return testing_patterns, n_samples\n",
+    "\n",
+    "# Function to test the model using the configured testing patterns\n",
+    "def plot_input_output(input_data, output_data, index):\n",
+    "    fig, axes = plt.subplots(1, 2, figsize=(10, 6))\n",
+    "\n",
+    "    # Plot input data\n",
+    "    im1 = axes[0].imshow(input_data.cpu().numpy(), aspect='auto', cmap='viridis')\n",
+    "    axes[0].set_title('Input')\n",
+    "    fig.colorbar(im1, ax=axes[0])\n",
+    "\n",
+    "    # Plot output data\n",
+    "    im2 = axes[1].imshow(output_data.cpu().numpy(), aspect='auto', cmap='viridis')\n",
+    "    axes[1].set_title('Output')\n",
+    "    fig.colorbar(im2, ax=axes[1])\n",
+    "\n",
+    "    plt.suptitle(f'Testing Pattern {index+1}')\n",
+    "    plt.show()\n",
+    "\n",
+    "# Function to test the model using the configured testing patterns\n",
+    "# Function to test the model using the configured testing patterns\n",
+    "def testing(testing_patterns, n_samples, loaded_model, loaded_model_2,factor):\n",
+    "\n",
+    "    def generate_chance_level(shape):\n",
+    "      chance_level = np.random.rand(*shape).tolist()\n",
+    "      return chance_level\n",
+    "\n",
+    "    results_for_plotting = []\n",
+    "    max_values_output_first_order = []\n",
+    "    max_indices_output_first_order = []\n",
+    "    max_values_patterns_tensor = []\n",
+    "    max_indices_patterns_tensor = []\n",
+    "    f1_scores_wager = []\n",
+    "\n",
+    "    mse_losses_indices = []\n",
+    "    mse_losses_values = []\n",
+    "    discrimination_performances = []\n",
+    "\n",
+    "\n",
+    "\n",
+    "    # Iterate through each set of testing patterns and targets\n",
+    "    for i in range(len(testing_patterns)):\n",
+    "        with torch.no_grad():  # Ensure no gradients are computed during testing\n",
+    "\n",
+    "            #For low vision the stimulus threshold was set to 0.3 as can seen in the generate_patters function\n",
+    "            threshold=0.5\n",
+    "            if i==2:\n",
+    "                threshold=0.15\n",
+    "\n",
+    "            # Obtain output from the first order model\n",
+    "            input_data = testing_patterns[i][0]\n",
+    "            hidden_representation,  output_first_order = loaded_model(input_data)\n",
+    "            output_second_order = loaded_model_2(input_data, output_first_order)\n",
+    "\n",
+    "            delta=100*factor\n",
+    "\n",
+    "            print(\"driscriminator\")\n",
+    "            print((output_first_order[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean())\n",
+    "            discrimination_performance = round((output_first_order[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean().item(), 2)\n",
+    "            discrimination_performances.append(discrimination_performance)\n",
+    "\n",
+    "\n",
+    "            chance_level = torch.Tensor( generate_chance_level((200*factor,100))).to(device)\n",
+    "            discrimination_random= round((chance_level[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean().item(), 2)\n",
+    "            print(\"chance level\" , discrimination_random)\n",
+    "\n",
+    "\n",
+    "\n",
+    "            #count all patterns in the dataset\n",
+    "            wagers = output_second_order[delta:].cpu()\n",
+    "\n",
+    "            _, targets_2 = torch.max(testing_patterns[i][1], 1)\n",
+    "            targets_2 = targets_2[delta:].cpu()\n",
+    "\n",
+    "            # Convert targets to binary classification for wagering scenario\n",
+    "            targets_2 = (targets_2 > 0).int()\n",
+    "\n",
+    "            # Convert tensors to NumPy arrays for metric calculations\n",
+    "            predicted_np = wagers.numpy().flatten()\n",
+    "            targets_2_np = targets_2.numpy()\n",
+    "\n",
+    "            #print(\"number of targets,\" , len(targets_2_np))\n",
+    "\n",
+    "            print(predicted_np)\n",
+    "            print(targets_2_np)\n",
+    "\n",
+    "            # Calculate True Positives, True Negatives, False Positives, and False Negatives\n",
+    "            TP = np.sum((predicted_np >  threshold) & (targets_2_np > threshold))\n",
+    "            TN = np.sum((predicted_np <  threshold ) & (targets_2_np < threshold))\n",
+    "            FP = np.sum((predicted_np >  threshold) & (targets_2_np <  threshold))\n",
+    "            FN = np.sum((predicted_np <  threshold) & (targets_2_np >  threshold))\n",
+    "\n",
+    "            # Compute precision, recall, F1 score, and accuracy for both high and low wager scenarios\n",
+    "            precision_h, recall_h, f1_score_h, accuracy_h = compute_metrics(TP, TN, FP, FN)\n",
+    "\n",
+    "            f1_scores_wager.append(f1_score_h)\n",
+    "\n",
+    "            # Collect results for plotting\n",
+    "            results_for_plotting.append({\n",
+    "                \"counts\": [[TP, FP, TP + FP]],\n",
+    "                \"metrics\": [[precision_h, recall_h, f1_score_h, accuracy_h]],\n",
+    "                \"title_results\": f\"Results Table - Set {i+1}\",\n",
+    "                \"title_metrics\": f\"Metrics Table - Set {i+1}\"\n",
+    "            })\n",
+    "\n",
+    "            # Plot input and output of the first-order network\n",
+    "            plot_input_output(input_data, output_first_order, i)\n",
+    "\n",
+    "            max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
+    "            max_inds_out[max_vals_out == 0] = 0\n",
+    "            max_values_output_first_order.append(max_vals_out.tolist())\n",
+    "            max_indices_output_first_order.append(max_inds_out.tolist())\n",
+    "\n",
+    "            max_vals_pat, max_inds_pat = torch.max(input_data[100:], dim=1)\n",
+    "            max_inds_pat[max_vals_pat == 0] = 0\n",
+    "            max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
+    "            max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
+    "\n",
+    "            fig, axs = plt.subplots(1, 2, figsize=(15, 5))\n",
+    "\n",
+    "            # Scatter plot of indices: patterns_tensor vs. output_first_order\n",
+    "            axs[0].scatter(max_indices_patterns_tensor[i], max_indices_output_first_order[i], alpha=0.5)\n",
+    "            axs[0].set_title(f'Stimuli location: Condition {i+1} - First Order Input vs. First Order Output')\n",
+    "            axs[0].set_xlabel('First Order Input Indices')\n",
+    "            axs[0].set_ylabel('First Order Output Indices')\n",
+    "\n",
+    "            # Add quadratic fit to scatter plot\n",
+    "            x_indices = max_indices_patterns_tensor[i]\n",
+    "            y_indices = max_indices_output_first_order[i]\n",
+    "            y_pred_indices = perform_quadratic_regression(x_indices, y_indices)\n",
+    "            axs[0].plot(x_indices, y_pred_indices, color='skyblue')\n",
+    "\n",
+    "\n",
+    "            # Calculate MSE loss for indices\n",
+    "            mse_loss_indices = np.mean((np.array(x_indices) - np.array(y_indices)) ** 2)\n",
+    "            mse_losses_indices.append(mse_loss_indices)\n",
+    "\n",
+    "            # Scatter plot of values: patterns_tensor vs. output_first_order\n",
+    "            axs[1].scatter(max_values_patterns_tensor[i], max_values_output_first_order[i], alpha=0.5)\n",
+    "            axs[1].set_title(f'Stimuli Values: Condition {i+1} - First Order Input vs. First Order Output')\n",
+    "            axs[1].set_xlabel('First Order Input Values')\n",
+    "            axs[1].set_ylabel('First Order Output Values')\n",
+    "\n",
+    "            # Add quadratic fit to scatter plot\n",
+    "            x_values = max_values_patterns_tensor[i]\n",
+    "            y_values = max_values_output_first_order[i]\n",
+    "            y_pred_values = perform_quadratic_regression(x_values, y_values)\n",
+    "            axs[1].plot(x_values, y_pred_values, color='skyblue')\n",
+    "\n",
+    "            # Calculate MSE loss for values\n",
+    "            mse_loss_values = np.mean((np.array(x_values) - np.array(y_values)) ** 2)\n",
+    "            mse_losses_values.append(mse_loss_values)\n",
+    "\n",
+    "            plt.tight_layout()\n",
+    "            plt.show()\n",
+    "\n",
+    "    return f1_scores_wager, mse_losses_indices , mse_losses_values, discrimination_performances, results_for_plotting\n",
+    "\n",
+    "def generate_patterns(patterns_number, num_units, factor, condition = 0):\n",
+    "    \"\"\"\n",
+    "    Generates patterns and targets for training the networks\n",
+    "\n",
+    "    # patterns_number: Number of patterns to generate\n",
+    "    # num_units: Number of units in each pattern\n",
+    "    # pattern: 0: superthreshold, 1: subthreshold, 2: low vision\n",
+    "    # Returns lists of patterns, stimulus present/absent indicators, and second order targets\n",
+    "    \"\"\"\n",
+    "\n",
+    "    patterns_number= patterns_number*factor\n",
+    "\n",
+    "    patterns = []  # Store generated patterns\n",
+    "    stim_present = []  # Indicators for when a stimulus is present in the pattern\n",
+    "    stim_absent = []  # Indicators for when no stimulus is present\n",
+    "    order_2_pr = []  # Second order network targets based on the presence or absence of stimulus\n",
+    "\n",
+    "    if condition == 0:\n",
+    "        random_limit= 0.0\n",
+    "        baseline = 0\n",
+    "        multiplier = 1\n",
+    "\n",
+    "    if condition == 1:\n",
+    "        random_limit= 0.02\n",
+    "        baseline = 0.0012\n",
+    "        multiplier = 1\n",
+    "\n",
+    "    if condition == 2:\n",
+    "        random_limit= 0.02\n",
+    "        baseline = 0.0012\n",
+    "        multiplier = 0.3\n",
+    "\n",
+    "    # Generate patterns, half noise and half potential stimuli\n",
+    "    for i in range(patterns_number):\n",
+    "\n",
+    "        # First half: Noise patterns\n",
+    "        if i < patterns_number // 2:\n",
+    "\n",
+    "            pattern = multiplier * np.random.uniform(0.0, random_limit, num_units) + baseline # Generate a noise pattern\n",
+    "            patterns.append(pattern)\n",
+    "            stim_present.append(np.zeros(num_units))  # Stimulus absent\n",
+    "            order_2_pr.append([0.0 , 1.0])  # No stimulus, low wager\n",
+    "\n",
+    "        # Second half: Stimulus patterns\n",
+    "        else:\n",
+    "            stimulus_number = random.randint(0, num_units - 1) # Choose a unit for potential stimulus\n",
+    "            pattern = np.random.uniform(0.0, random_limit, num_units) + baseline\n",
+    "            pattern[stimulus_number] = np.random.uniform(0.0, 1.0) * multiplier   # Set stimulus intensity\n",
+    "\n",
+    "            patterns.append(pattern)\n",
+    "            present = np.zeros(num_units)\n",
+    "            # Determine if stimulus is above discrimination threshold\n",
+    "            if pattern[stimulus_number] >= multiplier/2:\n",
+    "                order_2_pr.append([1.0 , 0.0])  # Stimulus detected, high wager\n",
+    "                present[stimulus_number] = 1.0\n",
+    "            else:\n",
+    "                order_2_pr.append([0.0 , 1.0])  # Stimulus not detected, low wager\n",
+    "                present[stimulus_number] = 0.0\n",
+    "\n",
+    "            stim_present.append(present)\n",
+    "\n",
+    "\n",
+    "    patterns_tensor = torch.Tensor(patterns).to(device).requires_grad_(True)\n",
+    "    stim_present_tensor = torch.Tensor(stim_present).to(device).requires_grad_(True)\n",
+    "    stim_absent_tensor= torch.Tensor(stim_absent).to(device).requires_grad_(True)\n",
+    "    order_2_tensor = torch.Tensor(order_2_pr).to(device).requires_grad_(True)\n",
+    "\n",
+    "    return patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor\n",
+    "\n",
+    "def create_patterns(stimulus,factor):\n",
+    "    \"\"\"\n",
+    "    Generates neural network input patterns based on specified stimulus conditions.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - stimulus (int): Determines the type of patterns to generate.\n",
+    "                      Acceptable values:\n",
+    "                      - 0: Suprathreshold stimulus\n",
+    "                      - 1: Subthreshold stimulus\n",
+    "                      - 2: Low vision condition\n",
+    "\n",
+    "    Returns:\n",
+    "    - torch.Tensor: Tensor of generated patterns.\n",
+    "    - torch.Tensor: Tensor of target values corresponding to the generated patterns.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Generate initial patterns and target tensors for base condition.\n",
+    "\n",
+    "    patterns_tensor, stim_present_tensor, _, _ = generate_patterns(patterns_number, num_units ,factor, stimulus)\n",
+    "    # Convert pattern tensors for processing on specified device (CPU/GPU).\n",
+    "    patterns = torch.Tensor(patterns_tensor).to(device)\n",
+    "    targets = torch.Tensor(stim_present_tensor).to(device)\n",
+    "\n",
+    "    return patterns, targets\n",
+    "\n",
+    "def pre_train(first_order_network, second_order_network, criterion_1,  criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2, factor, meta):\n",
+    "    \"\"\"\n",
+    "    Conducts pre-training for first-order and second-order networks.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - first_order_network (torch.nn.Module): Network for basic input-output mapping.\n",
+    "    - second_order_network (torch.nn.Module): Network for decision-making based on the first network's output.\n",
+    "    - criterion_1, criterion_2 (torch.nn): Loss functions for the respective networks.\n",
+    "    - optimizer_1, optimizer_2 (torch.optim): Optimizers for the respective networks.\n",
+    "    - scheduler_1, scheduler_2 (torch.optim.lr_scheduler): Schedulers for learning rate adjustment.\n",
+    "    - factor (float): Parameter influencing data augmentation or pattern generation.\n",
+    "    - meta (bool): Flag indicating the use of meta-learning strategies.\n",
+    "\n",
+    "    Returns:\n",
+    "    Tuple containing updated networks and epoch-wise loss records.\n",
+    "\n",
+    "    \"\"\"\n",
+    "    def get_num_args(func):\n",
+    "      return func.__code__.co_argcount\n",
+    "\n",
+    "    max_values_output_first_order = []\n",
+    "    max_indices_output_first_order = []\n",
+    "    max_values_patterns_tensor = []\n",
+    "    max_indices_patterns_tensor = []\n",
+    "\n",
+    "    epoch_1_order = np.zeros(n_epochs)\n",
+    "    epoch_2_order = np.zeros(n_epochs)\n",
+    "\n",
+    "    for epoch in range(n_epochs):\n",
+    "        # Generate training patterns and targets for each epoch\n",
+    "        patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor = generate_patterns(patterns_number, num_units,factor, 0)\n",
+    "\n",
+    "        # Forward pass through the first-order network\n",
+    "        hidden_representation , output_first_order = first_order_network(patterns_tensor)\n",
+    "\n",
+    "        patterns_tensor=patterns_tensor.requires_grad_(True)\n",
+    "        output_first_order=output_first_order.requires_grad_(True)\n",
+    "\n",
+    "        # Get max values and indices for output_first_order\n",
+    "        max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
+    "        max_inds_out[max_vals_out == 0] = 0\n",
+    "        max_values_output_first_order.append(max_vals_out.tolist())\n",
+    "        max_indices_output_first_order.append(max_inds_out.tolist())\n",
+    "\n",
+    "        # Get max values and indices for patterns_tensor\n",
+    "        max_vals_pat, max_inds_pat = torch.max(patterns_tensor[100:], dim=1)\n",
+    "        max_inds_pat[max_vals_pat == 0] = 0\n",
+    "        max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
+    "        max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
+    "\n",
+    "        optimizer_1.zero_grad()\n",
+    "\n",
+    "        # Conditionally execute the second-order network pass and related operations\n",
+    "        if meta:\n",
+    "\n",
+    "            # Forward pass through the second-order network with inputs from the first-order network\n",
+    "            output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
+    "\n",
+    "            # Calculate the loss for the second-order network (wagering decision based on comparison)\n",
+    "            loss_2 = criterion_2(output_second_order.squeeze(), order_2_tensor[:, 0])\n",
+    "\n",
+    "            optimizer_2.zero_grad()\n",
+    "\n",
+    "\n",
+    "            # Backpropagate the second-order network's loss\n",
+    "            loss_2.backward(retain_graph=True)  # Allows further backpropagation for loss_1 after loss_2\n",
+    "\n",
+    "            # Update second-order network weights\n",
+    "            optimizer_2.step()\n",
+    "\n",
+    "            scheduler_2.step()\n",
+    "\n",
+    "            epoch_2_order[epoch] = loss_2.item()\n",
+    "        else:\n",
+    "            # Skip computations for the second-order network\n",
+    "            with torch.no_grad():\n",
+    "                # Potentially forward pass through the second-order network without tracking gradients\n",
+    "                output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
+    "\n",
+    "        # Calculate the loss for the first-order network (accuracy of stimulus representation)\n",
+    "\n",
+    "        num_args = get_num_args(criterion_1)\n",
+    "\n",
+    "        if num_args == 2:\n",
+    "          loss_1 = criterion_1(  output_first_order , stim_present_tensor )\n",
+    "        else:\n",
+    "          W = first_order_network.state_dict()['fc1.weight']\n",
+    "          loss_1 = criterion_1( W, stim_present_tensor.view(-1, 100), output_first_order,\n",
+    "                             hidden_representation, lam )\n",
+    "\n",
+    "        # Backpropagate the first-order network's loss\n",
+    "        loss_1.backward()\n",
+    "\n",
+    "        # Update first-order network weights\n",
+    "        optimizer_1.step()\n",
+    "\n",
+    "        # Reset first-order optimizer gradients to zero for the next iteration\n",
+    "\n",
+    "        # Update the first-order scheduler\n",
+    "        scheduler_1.step()\n",
+    "\n",
+    "        epoch_1_order[epoch] = loss_1.item()\n",
+    "        #epoch_1_order[epoch] = loss_location.item()\n",
+    "\n",
+    "    return first_order_network, second_order_network, epoch_1_order, epoch_2_order , (max_values_output_first_order[-1],\n",
+    "            max_indices_output_first_order[-1],\n",
+    "            max_values_patterns_tensor[-1],\n",
+    "            max_indices_patterns_tensor[-1])\n",
+    "\n",
+    "def HOSS_evaluate(X, mu, Sigma, Aprior, Wprior):\n",
+    "    \"\"\"\n",
+    "    Inference on 2D Bayes net for asymmetric inference on presence vs. absence.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Initialise variables and conditional prob tables\n",
+    "    p_A = np.array([1 - Aprior, Aprior])  # prior on awareness state A\n",
+    "    p_W_a1 = np.append(0, Wprior)  # likelihood of world states W given aware, first entry is absence\n",
+    "    p_W_a0 = np.append(1, np.zeros(len(Wprior)))  # likelihood of world states W given unaware, first entry is absence\n",
+    "    p_W = (p_W_a1 + p_W_a0) / 2  # prior on W marginalising over A (for KL)\n",
+    "\n",
+    "    # Compute likelihood of observed X for each possible W (P(X|mu_w, Sigma))\n",
+    "    lik_X_W = np.array([multivariate_normal.pdf(X, mean=mu_i, cov=Sigma) for mu_i in mu])\n",
+    "    p_X_W = lik_X_W / lik_X_W.sum()  # normalise to get P(X|W)\n",
+    "\n",
+    "    # Combine with likelihood of each world state w given awareness state A\n",
+    "    lik_W_A = np.vstack((p_X_W * p_W_a0 * p_A[0], p_X_W * p_W_a1 * p_A[1]))\n",
+    "    post_A = lik_W_A.sum(axis=1)  # sum over W\n",
+    "    post_A = post_A / post_A.sum()  # normalise\n",
+    "\n",
+    "    # Posterior over W (P(W|X=x) marginalising over A)\n",
+    "    post_W = lik_W_A.sum(axis=0)  # sum over A\n",
+    "    post_W = post_W / post_W.sum()  # normalise\n",
+    "\n",
+    "    # KL divergences\n",
+    "    KL_W = (post_W * np.log(post_W / p_W)).sum()\n",
+    "    KL_A = (post_A * np.log(post_A / p_A)).sum()\n",
+    "\n",
+    "    return post_W, post_A, KL_W, KL_A"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4e5761b4",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Plotting functions\n",
+    "# @markdown\n",
+    "\n",
+    "def plot_testing(results_seed, discrimination_seed, seeds, title):\n",
+    "    print(results_seed)\n",
+    "    print(discrimination_seed)\n",
+    "\n",
+    "    Testing_graph_names = [\"Suprathreshold stimulus\", \"Subthreshold stimulus\", \"Low Vision\"]\n",
+    "\n",
+    "    fig, ax = plt.subplots(figsize=(14, len(results_seed[0]) * 2 + 2))  # Adjusted for added header space\n",
+    "    ax.axis('off')\n",
+    "    ax.axis('tight')\n",
+    "\n",
+    "    # Define column labels\n",
+    "    col_labels = [\"Scenario\", \"F1 SCORE\\n(2nd order network)\", \"RECALL\\n(2nd order network)\", \"PRECISION\\n(2nd order network)\", \"Discrimination Performance\\n(1st order network)\", \"ACCURACY\\n(2nd order network)\"]\n",
+    "\n",
+    "    # Initialize list to hold all rows of data including headers\n",
+    "    full_data = []\n",
+    "\n",
+    "    # Calculate averages and standard deviations\n",
+    "    for i in range(len(results_seed[0])):\n",
+    "        metrics_list = [result[i][\"metrics\"][0] for result in results_seed]  # Collect metrics for each seed\n",
+    "        discrimination_list = [discrimination_seed[j][i] for j in range(seeds)]\n",
+    "\n",
+    "        # Calculate averages and standard deviations for metrics\n",
+    "        avg_metrics = np.mean(metrics_list, axis=0).tolist()\n",
+    "        std_metrics = np.std(metrics_list, axis=0).tolist()\n",
+    "\n",
+    "        # Calculate average and standard deviation for discrimination performance\n",
+    "        avg_discrimination = np.mean(discrimination_list)\n",
+    "        std_discrimination = np.std(discrimination_list)\n",
+    "\n",
+    "        # Format the row with averages and standard deviations\n",
+    "        row = [\n",
+    "            Testing_graph_names[i],\n",
+    "            f\"{avg_metrics[2]:.2f} ± {std_metrics[2]:.2f}\",  # F1 SCORE\n",
+    "            f\"{avg_metrics[1]:.2f} ± {std_metrics[1]:.2f}\",  # RECALL\n",
+    "            f\"{avg_metrics[0]:.2f} ± {std_metrics[0]:.2f}\",  # PRECISION\n",
+    "            f\"{avg_discrimination:.2f} ± {std_discrimination:.2f}\",  # Discrimination Performance\n",
+    "            f\"{avg_metrics[3]:.2f} ± {std_metrics[3]:.2f}\"  # ACCURACY\n",
+    "        ]\n",
+    "        full_data.append(row)\n",
+    "\n",
+    "    # Extract metric values for color scaling (excluding the first and last columns which are text)\n",
+    "    metric_values = np.array([[float(x.split(\" ± \")[0]) for x in row[1:]] for row in full_data])  # Convert to float for color scaling\n",
+    "    max_value = np.max(metric_values)\n",
+    "    colors = metric_values / max_value  # Normalize for color mapping\n",
+    "\n",
+    "    # Prepare colors for all cells, defaulting to white for non-metric cells\n",
+    "    cell_colors = [[\"white\"] * len(col_labels) for _ in range(len(full_data))]\n",
+    "    for i, row in enumerate(colors):\n",
+    "        cell_colors[i][1] = plt.cm.RdYlGn(row[0])\n",
+    "        cell_colors[i][2] = plt.cm.RdYlGn(row[1])\n",
+    "        cell_colors[i][3] = plt.cm.RdYlGn(row[2])\n",
+    "        cell_colors[i][5] = plt.cm.RdYlGn(row[3])  # Adding color for accuracy\n",
+    "\n",
+    "    # Adding color for discrimination performance\n",
+    "    discrimination_colors = colors[:, 3]\n",
+    "    for i, dp_color in enumerate(discrimination_colors):\n",
+    "        cell_colors[i][4] = plt.cm.RdYlGn(dp_color)\n",
+    "\n",
+    "    # Create the main table with cell colors\n",
+    "    table = ax.table(cellText=full_data, colLabels=col_labels, loc='center', cellLoc='center', cellColours=cell_colors)\n",
+    "    table.auto_set_font_size(False)\n",
+    "    table.set_fontsize(10)\n",
+    "    table.scale(1.5, 1.5)\n",
+    "\n",
+    "    # Set the height of the header row to be double that of the other rows\n",
+    "    for j, col_label in enumerate(col_labels):\n",
+    "        cell = table[(0, j)]\n",
+    "        cell.set_height(cell.get_height() * 2)\n",
+    "\n",
+    "    # Add chance level table\n",
+    "    chance_level_data = [[\"Chance Level\\nDiscrimination(1st)\", \"Chance Level\\nAccuracy(2nd)\"],\n",
+    "                         [\"0.010\", \"0.50\"]]\n",
+    "\n",
+    "    chance_table = ax.table(cellText=chance_level_data, bbox=[1.0, 0.8, 0.3, 0.1], cellLoc='center', colWidths=[0.1, 0.1])\n",
+    "    chance_table.auto_set_font_size(False)\n",
+    "    chance_table.set_fontsize(10)\n",
+    "    chance_table.scale(1.2, 1.2)\n",
+    "\n",
+    "    # Set the height of the header row to be double that of the other rows in the chance level table\n",
+    "    for j in range(len(chance_level_data[0])):\n",
+    "        cell = chance_table[(0, j)]\n",
+    "        cell.set_height(cell.get_height() * 2)\n",
+    "\n",
+    "    plt.title(title, pad=20, fontsize=16)\n",
+    "    plt.show()\n",
+    "    plt.close(fig)\n",
+    "\n",
+    "\n",
+    "def plot_signal_max_and_indicator(patterns_tensor, plot_title=\"Training Signals\"):\n",
+    "    \"\"\"\n",
+    "    Plots the maximum values of signal units and a binary indicator for max values greater than 0.5.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - patterns_tensor: A tensor containing signals, where each signal is expected to have multiple units.\n",
+    "    \"\"\"\n",
+    "    with plt.xkcd():\n",
+    "\n",
+    "        # Calculate the maximum value of units for each signal within the patterns tensor\n",
+    "        max_values_of_units = patterns_tensor.max(dim=1).values.cpu().numpy()  # Ensure it's on CPU and in NumPy format for plotting\n",
+    "\n",
+    "        # Determine the binary indicators based on the max value being greater than 0.5\n",
+    "        binary_indicators = (max_values_of_units > 0.5).astype(int)\n",
+    "\n",
+    "        # Create a figure with 2 subplots (2 rows, 1 column)\n",
+    "        fig, axs = plt.subplots(2, 1, figsize=(8, 8))\n",
+    "\n",
+    "        fig.suptitle(plot_title, fontsize=16)  # Set the overall title for the plot\n",
+    "\n",
+    "        # First subplot for the maximum values of each signal\n",
+    "        axs[0].plot(range(patterns_tensor.size(0)), max_values_of_units, drawstyle='steps-mid')\n",
+    "        axs[0].set_xlabel('Pattern Number')\n",
+    "        axs[0].set_ylabel('Max Value of Signal Units')\n",
+    "        axs[0].set_ylim(-0.1, 1.1)  # Adjust y-axis limits for clarity\n",
+    "        axs[0].grid(True)\n",
+    "\n",
+    "        # Second subplot for the binary indicators\n",
+    "        axs[1].plot(range(patterns_tensor.size(0)), binary_indicators, drawstyle='steps-mid', color='red')\n",
+    "        axs[1].set_xlabel('Pattern Number')\n",
+    "        axs[1].set_ylabel('Indicator (Max > 0.5) in each signal')\n",
+    "        axs[1].set_ylim(-0.1, 1.1)  # Adjust y-axis limits for clarity\n",
+    "        axs[1].grid(True)\n",
+    "\n",
+    "        plt.tight_layout()\n",
+    "        plt.show()\n",
+    "\n",
+    "\n",
+    "def perform_quadratic_regression(epoch_list, values):\n",
+    "    # Perform quadratic regression\n",
+    "    coeffs = np.polyfit(epoch_list, values, 2)  # Coefficients of the polynomial\n",
+    "    y_pred = np.polyval(coeffs, epoch_list)        # Evaluate the polynomial at the given x values\n",
+    "    return y_pred\n",
+    "\n",
+    "\n",
+    "def pre_train_plots(epoch_1_order, epoch_2_order, title, max_values_indices):\n",
+    "    \"\"\"\n",
+    "    Plots the training progress with regression lines and scatter plots of indices and values of max elements.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - epoch_list (list): List of epoch numbers.\n",
+    "    - epoch_1_order (list): Loss values for the first-order network over epochs.\n",
+    "    - epoch_2_order (list): Loss values for the second-order network over epochs.\n",
+    "    - title (str): Title for the plots.\n",
+    "    - max_values_indices (tuple): Tuple containing lists of max values and indices for both tensors.\n",
+    "    \"\"\"\n",
+    "    (max_values_output_first_order,\n",
+    "     max_indices_output_first_order,\n",
+    "     max_values_patterns_tensor,\n",
+    "     max_indices_patterns_tensor) = max_values_indices\n",
+    "\n",
+    "    # Perform quadratic regression for the loss plots\n",
+    "    epoch_list = list(range(len(epoch_1_order)))\n",
+    "    y_pred1 = perform_quadratic_regression(epoch_list, epoch_1_order)\n",
+    "    y_pred2 = perform_quadratic_regression(epoch_list, epoch_2_order)\n",
+    "\n",
+    "    # Set up the plot with 2 rows and 2 columns\n",
+    "    fig, axs = plt.subplots(2, 2, figsize=(15, 10))\n",
+    "\n",
+    "    # First graph for 1st Order Network\n",
+    "    axs[0, 0].plot(epoch_list, epoch_1_order, linestyle='--', marker='o', color='g')\n",
+    "    axs[0, 0].plot(epoch_list, y_pred1, linestyle='-', color='r', label='Quadratic Fit')\n",
+    "    axs[0, 0].legend(['1st Order Network', 'Quadratic Fit'])\n",
+    "    axs[0, 0].set_title('1st Order Network Loss')\n",
+    "    axs[0, 0].set_xlabel('Epochs - Pretraining Phase')\n",
+    "    axs[0, 0].set_ylabel('Loss')\n",
+    "\n",
+    "    # Second graph for 2nd Order Network\n",
+    "    axs[0, 1].plot(epoch_list, epoch_2_order, linestyle='--', marker='o', color='b')\n",
+    "    axs[0, 1].plot(epoch_list, y_pred2, linestyle='-', color='r', label='Quadratic Fit')\n",
+    "    axs[0, 1].legend(['2nd Order Network', 'Quadratic Fit'])\n",
+    "    axs[0, 1].set_title('2nd Order Network Loss')\n",
+    "    axs[0, 1].set_xlabel('Epochs - Pretraining Phase')\n",
+    "    axs[0, 1].set_ylabel('Loss')\n",
+    "\n",
+    "    # Scatter plot of indices: patterns_tensor vs. output_first_order\n",
+    "    axs[1, 0].scatter(max_indices_patterns_tensor, max_indices_output_first_order, alpha=0.5)\n",
+    "\n",
+    "    # Add quadratic regression line\n",
+    "    indices_regression = perform_quadratic_regression(max_indices_patterns_tensor, max_indices_output_first_order)\n",
+    "    axs[1, 0].plot(max_indices_patterns_tensor, indices_regression, color='skyblue', linestyle='--', label='Quadratic Fit')\n",
+    "\n",
+    "    axs[1, 0].set_title('Stimuli location: First Order Input vs. First Order Output')\n",
+    "    axs[1, 0].set_xlabel('First Order Input Indices')\n",
+    "    axs[1, 0].set_ylabel('First Order Output Indices')\n",
+    "    axs[1, 0].legend()\n",
+    "\n",
+    "    # Scatter plot of values: patterns_tensor vs. output_first_order\n",
+    "    axs[1, 1].scatter(max_values_patterns_tensor, max_values_output_first_order, alpha=0.5)\n",
+    "\n",
+    "    # Add quadratic regression line\n",
+    "    values_regression = perform_quadratic_regression(max_values_patterns_tensor, max_values_output_first_order)\n",
+    "    axs[1, 1].plot(max_values_patterns_tensor, values_regression, color='skyblue', linestyle='--', label='Quadratic Fit')\n",
+    "\n",
+    "    axs[1, 1].set_title('Stimuli Values: First Order Input vs. First Order Output')\n",
+    "    axs[1, 1].set_xlabel('First Order Input Values')\n",
+    "    axs[1, 1].set_ylabel('First Order Output Values')\n",
+    "    axs[1, 1].legend()\n",
+    "\n",
+    "    plt.suptitle(title, fontsize=16, y=1.02)\n",
+    "\n",
+    "    # Display the plots in a 2x2 grid\n",
+    "    plt.tight_layout()\n",
+    "    plt.savefig('Blindsight_Pre_training_Loss_{}.png'.format(title.replace(\" \", \"_\").replace(\"/\", \"_\")), bbox_inches='tight')\n",
+    "    plt.show()\n",
+    "    plt.close(fig)\n",
+    "\n",
+    "# Function to configure the training environment and load the models\n",
+    "def config_training(first_order_network, second_order_network, hidden, factor, gelu):\n",
+    "    \"\"\"\n",
+    "    Configures the training environment by saving the state of the given models and loading them back.\n",
+    "    Initializes testing patterns for evaluation.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - first_order_network: The first order network instance.\n",
+    "    - second_order_network: The second order network instance.\n",
+    "    - hidden: Number of hidden units in the first order network.\n",
+    "    - factor: Factor influencing the network's architecture.\n",
+    "    - gelu: Activation function to be used in the network.\n",
+    "\n",
+    "    Returns:\n",
+    "    - Tuple of testing patterns, number of samples in the testing patterns, and the loaded model instances.\n",
+    "    \"\"\"\n",
+    "    # Paths where the models' states will be saved\n",
+    "    PATH = './cnn1.pth'\n",
+    "    PATH_2 = './cnn2.pth'\n",
+    "\n",
+    "    # Save the weights of the pretrained networks to the specified paths\n",
+    "    torch.save(first_order_network.state_dict(), PATH)\n",
+    "    torch.save(second_order_network.state_dict(), PATH_2)\n",
+    "\n",
+    "    # Generating testing patterns for three different sets\n",
+    "    First_set, First_set_targets = create_patterns(0,factor)\n",
+    "    Second_set, Second_set_targets = create_patterns(1,factor)\n",
+    "    Third_set, Third_set_targets = create_patterns(2,factor)\n",
+    "\n",
+    "    # Aggregate testing patterns and their targets for ease of access\n",
+    "    Testing_patterns = [[First_set, First_set_targets], [Second_set, Second_set_targets], [Third_set, Third_set_targets]]\n",
+    "\n",
+    "    # Determine the number of samples from the first set (assumed consistent across all sets)\n",
+    "    n_samples = len(Testing_patterns[0][0])\n",
+    "\n",
+    "    # Initialize and load the saved states into model instances\n",
+    "    loaded_model = FirstOrderNetwork(hidden, factor, gelu)\n",
+    "    loaded_model_2 = SecondOrderNetwork(gelu)\n",
+    "\n",
+    "    loaded_model.load_state_dict(torch.load(PATH))\n",
+    "    loaded_model_2.load_state_dict(torch.load(PATH_2))\n",
+    "\n",
+    "    # Ensure the models are moved to the appropriate device (CPU/GPU) and set to evaluation mode\n",
+    "    loaded_model.to(device)\n",
+    "    loaded_model_2.to(device)\n",
+    "\n",
+    "    loaded_model.eval()\n",
+    "    loaded_model_2.eval()\n",
+    "\n",
+    "    return Testing_patterns, n_samples, loaded_model, loaded_model_2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de910a5b",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "---\n",
+    "\n",
+    "# Introduction\n",
+    "\n",
+    "This bonus tutorial extends a lot of the content that was covered in Tutorial 1 based around the theme of Consciousness. At the end of Section 2. We discussed and implemented a lot of ideas around first-order models and we briefly mentioned second-order models. In this tutorial, we're going to actually develop some ideas and model the effects of blindsight, the phenomenon we introduced earlier on today, where patients have no conscious experience of sight but are able to navigate around objects (showing that their brains are processing sensory information, but it doesn't reach the level of subjective experience). We first introduce the coding of the first-order model, followed by the second-order model. Then we show you some ways to plot the results from these models.\n",
+    "\n",
+    "After this we end on some further high-level thoughts on the theme of consciousness. \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76dd7488-6558-4022-8541-22765f2967c6",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "## Section 1: Train a First-Order Network\n",
+    "\n",
+    "This section invites you to engage with a straightforward, auto-generated dataset on blindsight, originally introduced by [Pasquali et al. in 2010](https://www.sciencedirect.com/science/article/abs/pii/S0010027710001794). Blindsight is a fascinating condition where individuals who are cortically blind due to damage in their primary visual cortex can still respond to visual stimuli without conscious perception. This intriguing phenomenon underscores the intricate nature of sensory processing and the brain's ability to process information without conscious awareness."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8c79b0a2-8e12-44ea-a685-bba788f6685d",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Visualize the autogenerated data\n",
+    "factor=2\n",
+    "initialize_global()\n",
+    "set_pre, _ = create_patterns(0,factor)\n",
+    "plot_signal_max_and_indicator(set_pre.detach().cpu(), \"Example - Pre training dataset\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cff70408-8662-43f5-b930-fc2a6ffca323",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "The pre-training dataset for the network consisted of 200 patterns. These were evenly divided: half were purely noise (with unit activations randomly chosen between 0.0 and 0.02), and the other half represented potential stimuli. In the stimulus patterns, 99 out of 100 units had activations ranging between 0.0 and 0.02, with one unique unit having an activation between 0.0 and 1.0."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f45662a-08b4-44a4-89e1-200fc0c9cddb",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "**Testing patterns**\n",
+    "\n",
+    "As we have seen before, the network underwent evaluations under three distinct conditions, each modifying the signal-to-noise ratio in a unique way to explore different degrees and types of blindness.\n",
+    "\n",
+    "Suprathreshold stimulus condition: here, the network was exposed to the identical set of 200 patterns used during pre-training, testing the network's response to familiar inputs.\n",
+    "\n",
+    "Subthreshold stimulus condition (blindsight simulation): this condition aimed to mimic blindsight. It was achieved by introducing a slight noise increment (+0.0012) to every input of the first-order network, barring the one designated as the stimulus. This setup tested the network's ability to discern faint signals amidst noise.\n",
+    "\n",
+    "Low vision condition: to simulate low vision, the activation levels of the stimuli were reduced. Unlike the range from 0.0 to 1.0 used in pre-training, the stimuli's activation levels were adjusted to span from 0.0 to 0.3. This condition examined the network's capability to recognize stimuli with diminished intensity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "db58d78b-17d8-4651-801a-f06e568a7322",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "factor=2\n",
+    "# Compare your results with the patterns generate below\n",
+    "set_1, _ = create_patterns(0,factor)\n",
+    "set_2, _ = create_patterns(1,factor)\n",
+    "set_3, _ = create_patterns(2,factor)\n",
+    "\n",
+    "# Plot\n",
+    "plot_signal_max_and_indicator(set_1.detach().cpu(), \"Suprathreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_2.detach().cpu(), \"Subthreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_3.detach().cpu(), \"Low Vision dataset\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd5c13e0-75e8-45c2-b1be-70496041364b",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Coding Exercise 1: Building a network for a blindsight situation\n",
+    "\n",
+    "In this activity, we'll construct a neural network model using our auto-generated dataset, focusing on blindsight scenarios. The model will primarily consist of fully connected layers, establishing a straightforward, first-order network. The aim here is to assess the basic network's performance.\n",
+    "\n",
+    "**Steps to follow**\n",
+    "\n",
+    "1. Examine the network architecture: understand the structure of the neural network you're about to work with.\n",
+    "2. Visualize loss metrics: observe and analyze the network's performance during pre-training by visualizing the loss over epochs.\n",
+    "3. Evaluate the model: use the provided code snippets to calculate and interpret the model's accuracy, recall, and F1-score, giving you insight into the network's capabilities.\n",
+    "\n",
+    "**Understanding the process**\n",
+    "\n",
+    "The goal is to gain a thorough comprehension of the network's architecture and to interpret the pre-training results visually. This will provide a clearer picture of the model's potential and limitations.\n",
+    "\n",
+    "The network is designed as a backpropagation autoassociator. It features a 100-unit input layer, directly linked to a 40-unit hidden layer, which in turn connects to a 100-unit output layer. Initial connection weights are set within the range of -1.0 to 1.0 for the first-order network. To mitigate overfitting, dropout is employed within the network architecture. The architecture includes a configurable activation function. This flexibility allows for adjustments and tuning in Activity 3, aiming for optimal model performance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "94d0bcaf-8b49-4e35-b0d2-1b9dcc98b182",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "class FirstOrderNetwork(nn.Module):\n",
+    "    def __init__(self, hidden_units, data_factor, use_gelu):\n",
+    "        \"\"\"\n",
+    "        Initializes the FirstOrderNetwork with specific configurations.\n",
+    "\n",
+    "        Parameters:\n",
+    "        - hidden_units (int): The number of units in the hidden layer.\n",
+    "        - data_factor (int): Factor to scale the amount of data processed.\n",
+    "                             A factor of 1 indicates the default data amount,\n",
+    "                             while 10 indicates 10 times the default amount.\n",
+    "        - use_gelu (bool): Flag to use GELU (True) or ReLU (False) as the activation function.\n",
+    "        \"\"\"\n",
+    "        super(FirstOrderNetwork, self).__init__()\n",
+    "\n",
+    "        # Define the encoder, hidden, and decoder layers with specified units\n",
+    "\n",
+    "        self.fc1 = nn.Linear(100, hidden_units, bias = False) # Encoder\n",
+    "        self.hidden= nn.Linear(hidden_units, hidden_units, bias = False) # Hidden\n",
+    "        self.fc2 = nn.Linear(hidden_units, 100, bias = False) # Decoder\n",
+    "\n",
+    "        self.relu = nn.ReLU()\n",
+    "        self.sigmoid = nn.Sigmoid()\n",
+    "\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting\n",
+    "        self.dropout = nn.Dropout(0.1)\n",
+    "\n",
+    "        # Set the data factor\n",
+    "        self.data_factor = data_factor\n",
+    "\n",
+    "        # Other activation functions for various purposes\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize network weights\n",
+    "        self.initialize_weights()\n",
+    "\n",
+    "    def initialize_weights(self):\n",
+    "        \"\"\"Initializes weights of the encoder, hidden, and decoder layers uniformly.\"\"\"\n",
+    "        init.uniform_(self.fc1.weight, -1.0, 1.0)\n",
+    "\n",
+    "        init.uniform_(self.fc2.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.hidden.weight, -1.0, 1.0)\n",
+    "\n",
+    "    def encoder(self, x):\n",
+    "      h1 = self.dropout(self.relu(self.fc1(x.view(-1, 100))))\n",
+    "      return h1\n",
+    "\n",
+    "    def decoder(self,z):\n",
+    "      #h2 = self.relu(self.hidden(z))\n",
+    "      h2 = self.sigmoid(self.fc2(z))\n",
+    "      return h2\n",
+    "\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "      \"\"\"\n",
+    "      Defines the forward pass through the network.\n",
+    "\n",
+    "      Parameters:\n",
+    "      - x (Tensor): The input tensor to the network.\n",
+    "\n",
+    "      Returns:\n",
+    "      - Tensor: The output of the network after passing through the layers and activations.\n",
+    "      \"\"\"\n",
+    "      h1 = self.encoder(x)\n",
+    "      h2 = self.decoder(h1)\n",
+    "\n",
+    "      return h1 , h2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83e07f1a-540b-4bfa-8e9b-f114319f1f96",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "For now, we will train the first-order network only."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4202ab0d",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "seeds=15\n",
+    "\n",
+    "results_seed=[]\n",
+    "discrimination_seed=[]\n",
+    "\n",
+    "# Hyperparameters\n",
+    "optimizer=\"ADAMAX\"\n",
+    "hidden=40\n",
+    "factor=2\n",
+    "gelu=False\n",
+    "gam=0.98\n",
+    "meta=True\n",
+    "stepsize=25\n",
+    "\n",
+    "for i in range(seeds):\n",
+    "  print(f\"Seed {i}\")\n",
+    "\n",
+    "  # Compare your results with the patterns generate below\n",
+    "  initialize_global()\n",
+    "\n",
+    "  # Prepare networks, loss functions, optimizers, and schedulers for pre-training\n",
+    "  first_order_network, second_order_network, criterion_1, criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2 = prepare_pre_training(hidden, factor, gelu, stepsize, gam)\n",
+    "\n",
+    "  # Conduct pre-training for both the first-order and second-order networks\n",
+    "  first_order_network_pre, second_order_network_pre, epoch_1_order, epoch_2_order , max_value_indices = pre_train(first_order_network, second_order_network, criterion_1,  criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2, factor, meta)\n",
+    "\n",
+    "  # Plot the training progress of both networks to visualize performance and learning trends\n",
+    "  pre_train_plots(epoch_1_order, epoch_2_order, f\"1st & 2nd Order Networks - Seed {i}\" , max_value_indices )\n",
+    "\n",
+    "  # Configuration step for the main training phase or evaluation\n",
+    "  testing_patterns, n_samples = get_test_patterns(factor)\n",
+    "\n",
+    "  # Function to test the model using the configured testing patterns\n",
+    "  first_order_network_pre.eval()\n",
+    "  second_order_network_pre.eval()\n",
+    "  f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, first_order_network_pre, second_order_network_pre,factor)\n",
+    "  results_seed.append(results_for_plotting)\n",
+    "  discrimination_seed.append(discrimination_performances)\n",
+    "\n",
+    "plot_testing(results_seed, discrimination_seed, seeds, \"Test Results\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7bfade3d-6385-459c-8f07-e3017264455a",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "\n",
+    "# Hyperparameters\n",
+    "global optimizer ,n_epochs , learning_rate_1\n",
+    "learning_rate_1 = 0.5\n",
+    "n_epochs = 100\n",
+    "optimizer=\"ADAMAX\"\n",
+    "hidden=40\n",
+    "factor=2\n",
+    "gelu=False\n",
+    "gam=0.98\n",
+    "meta=True\n",
+    "stepsize=25\n",
+    "initialize_global()\n",
+    "\n",
+    "\n",
+    "# Networks instantiation\n",
+    "first_order_network = FirstOrderNetwork(hidden,factor,gelu).to(device)\n",
+    "second_order_network = SecondOrderNetwork(gelu).to(device) # We define it, but won't use it until activity 3\n",
+    "\n",
+    "# Loss function\n",
+    "criterion_1 = CAE_loss\n",
+    "\n",
+    "# Optimizer\n",
+    "optimizer_1 = optim.Adamax(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "\n",
+    "# Learning rate schedulers\n",
+    "scheduler_1 = StepLR(optimizer_1, step_size=stepsize, gamma=gam)\n",
+    "\n",
+    "max_values_output_first_order = []\n",
+    "max_indices_output_first_order = []\n",
+    "max_values_patterns_tensor = []\n",
+    "max_indices_patterns_tensor = []\n",
+    "\n",
+    "# Training loop\n",
+    "for epoch in range(n_epochs):\n",
+    "    # Generate training patterns and targets for each epoch.\n",
+    "    patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor = generate_patterns(patterns_number, num_units,factor, 0)\n",
+    "\n",
+    "    # Forward pass through the first-order network\n",
+    "    hidden_representation , output_first_order = first_order_network(patterns_tensor)\n",
+    "\n",
+    "    output_first_order=output_first_order.requires_grad_(True)\n",
+    "\n",
+    "    # Skip computations for the second-order network\n",
+    "    with torch.no_grad():\n",
+    "\n",
+    "        # Potentially forward pass through the second-order network without tracking gradients\n",
+    "        output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
+    "\n",
+    "    # Calculate the loss for the first-order network (accuracy of stimulus representation)\n",
+    "    W = first_order_network.state_dict()['fc1.weight']\n",
+    "    loss_1 = criterion_1( W, stim_present_tensor.view(-1, 100), output_first_order,\n",
+    "                        hidden_representation, lam )\n",
+    "    # Backpropagate the first-order network's loss\n",
+    "    loss_1.backward()\n",
+    "\n",
+    "    # Update first-order network weights\n",
+    "    optimizer_1.step()\n",
+    "\n",
+    "    # Reset first-order optimizer gradients to zero for the next iteration\n",
+    "\n",
+    "    # Update the first-order scheduler\n",
+    "    scheduler_1.step()\n",
+    "\n",
+    "    epoch_1_order[epoch] = loss_1.item()\n",
+    "\n",
+    "    # Get max values and indices for output_first_order\n",
+    "    max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
+    "    max_inds_out[max_vals_out == 0] = 0\n",
+    "    max_values_output_first_order.append(max_vals_out.tolist())\n",
+    "    max_indices_output_first_order.append(max_inds_out.tolist())\n",
+    "\n",
+    "    # Get max values and indices for patterns_tensor\n",
+    "    max_vals_pat, max_inds_pat = torch.max(patterns_tensor[100:], dim=1)\n",
+    "    max_inds_pat[max_vals_pat == 0] = 0\n",
+    "    max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
+    "    max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
+    "\n",
+    "\n",
+    "max_values_indices = (max_values_output_first_order[-1],\n",
+    "            max_indices_output_first_order[-1],\n",
+    "            max_values_patterns_tensor[-1],\n",
+    "            max_indices_patterns_tensor[-1])\n",
+    "\n",
+    "\n",
+    "# Plot training loss curve\n",
+    "pre_train_plots(epoch_1_order, epoch_2_order, \"1st & 2nd Order Networks\" , max_value_indices )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa7d1ce9-da1b-4f78-b388-47bfcd50c6dd",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Testing under 3 Blindsight Conditions\n",
+    "\n",
+    "We will now use the testing auto-generated datasets from activity 1 to test the network's performance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2affe162-f4d9-495f-862a-65b0f50ca5ef",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "results_seed=[]\n",
+    "discrimination_seed=[]\n",
+    "\n",
+    "# Prepare networks for testing by calling the configuration function\n",
+    "testing_patterns, n_samples, loaded_model, loaded_model_2 = config_training(first_order_network, second_order_network, hidden, factor, gelu)\n",
+    "\n",
+    "# Perform testing using the defined function and plot the results\n",
+    "f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, loaded_model, loaded_model_2,factor)\n",
+    "\n",
+    "results_seed.append(results_for_plotting)\n",
+    "discrimination_seed.append(discrimination_performances)\n",
+    "# Assuming plot_testing is defined, call it to display results\n",
+    "plot_testing(results_seed,discrimination_seed,  1, \"Seed\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0d18302e-4657-4732-b6ef-f7439d2bb2fd",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_First_order_network\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96579a08-3c95-4dfe-9908-fabe1bb146d0",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "## Section 2: Train a Second-Order network"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caac41bc-5a93-43bf-aede-7c1e87e83fbd",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "Having previously examined the first-order network, we now switch to the second-order network, described in more detail back in Tutorial 1 (please revisit the text and video content there if you need to recap the concepts or want to refresh your understanding of the difference between these models )\n",
+    "To study this, we use a simulated dataset that mimics the conditions of blindsight. This dataset contains 400 patterns, equally split between two types:\n",
+    "\n",
+    "- **Random noise patterns** consist of low activations ranging between 0.0 and 0.02.\n",
+    "- **Designed stimulus patterns** - each pattern includes one unit that shows a higher activation level, varying between 0.0 and 1.0.\n",
+    "\n",
+    "This dataset allows us to test hypotheses concerning how sensory processing and network responses adapt under different conditions of visual impairment.\n",
+    "\n",
+    "We have three main testing scenarios, each designed to alter the signal-to-noise ratio to simulate different levels of visual impairment:\n",
+    "\n",
+    "- **Suprathreshold stimulus condition**: here, the network is tested against familiar patterns used during training to assess its response to known stimuli.\n",
+    "- **Subthreshold stimulus condition**: this condition slightly increases the noise level, akin to actual blindsight conditions, testing the network's capability to discern subtle signals.\n",
+    "- **Low vision condition**: the intensity of stimuli is decreased to evaluate how well the network performs with significantly reduced sensory input."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0b549db9-e8b0-4c49-89d2-b7324b3a4ed1",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "factor=2\n",
+    "\n",
+    "initialize_global()\n",
+    "set_1, _ = create_patterns(0,factor)\n",
+    "set_2, _ = create_patterns(1,factor)\n",
+    "set_3, _ = create_patterns(2,factor)\n",
+    "\n",
+    "# Plot\n",
+    "plot_signal_max_and_indicator(set_1.detach().cpu(), \"Suprathreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_2.detach().cpu(), \"Subthreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_3.detach().cpu(), \"Low Vision dataset\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96a91af5-c498-429d-a407-afa66d7444db",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "The first-order network model lays the groundwork for our experiments and is structured as follows:\n",
+    "\n",
+    "- Input layer: consists of 100 units representing either noise or stimulus patterns.\n",
+    "- Hidden layer: includes a 40-unit layer tasked with processing the inputs.\n",
+    "- Output layer: comprises 100 units where the responses to stimuli are recorded.\n",
+    "- Dropout and activation: includes dropout layers to prevent overfitting and a temperature-controlled activation function to fine-tune response sharpness.\n",
+    "\n",
+    "The primary aim of the first-order network is to accurately capture and react to the input patterns, setting a baseline for comparison with more complex models."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "768e074d-1a07-4f3e-8a5d-de31849e7730",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Coding Exercise 2: Developing a Second-Order Network\n",
+    "\n",
+    "Your task is to expand upon the first-order network by integrating a second-order network that incorporates a metacognitive layer assessing the predictions of the first-order network. This metacognitive layer introduces a wagering mechanism, wherein the network \"bets\" on its confidence in its predictions. \n",
+    "\n",
+    "- The first-order network is designed as an autoencoder, a type of neural network trained to reconstruct the input stimulus. The autoencoder consists of an encoder that compresses the input into a latent representation and a decoder that reconstructs the input from this representation.\n",
+    "- The second-order network, or metacognitive layer, operates by examining the difference (delta) between the original input and the output generated by the autoencoder. This difference provides insight into the reconstruction error, which is a measure of how accurately the autoencoder has learned to replicate the input data. By evaluating this reconstruction error, the second-order network can make a judgement about the certainty of the first-order network's predictions.\n",
+    "\n",
+    "These are the steps for completion:\n",
+    "\n",
+    "1. Architectural development: grasp the underlying principles of a second-order network and complete the architectural code.\n",
+    "2. Performance evaluation: visualize training losses and test the model using provided code, assessing its initial performance.\n",
+    "3. Model fine-tuning: leveraging the provided training function, experiment with fine-tuning the model to enhance its accuracy and efficiency.\n",
+    "\n",
+    "The second-order network is structured as a feedforward backpropagation network.\n",
+    "\n",
+    "- Input layer: comprises a 100-unit comparison matrix. This matrix quantifies the discrepancy between each corresponding pair of input and output units from the first-order network. For example, if an input unit and its corresponding output unit have activations of 0.6 and 0.7, respectively, the comparison unit's activation would be -0.1. This setup essentially encodes the prediction error of the first-order network's outputs as an input pattern for the second-order network.\n",
+    "- Output layer: consists of two units representing \"high\" and \"low\" wagers, indicating the network's confidence in its predictions. The initial weights for these output units range between 0.0 and 0.1.\n",
+    "- Comparator weights: set to 1.0 for connections from the first-order input layer to the comparison matrix, and -1.0 for connections from the first-order output layer. This configuration emphasizes the differential error as a critical input for the second-order decision-making process.\n",
+    "\n",
+    "The second-order network's novel approach uses the error generated by the first-order network as a direct input for making decisions—specifically, wagering on the confidence of its outputs. This methodology reflects a metacognitive layer of processing, akin to evaluating one's confidence in their answers or predictions.\n",
+    "\n",
+    "By exploring these adjustments, you can optimize the network's functionality, making it a powerful tool for understanding and simulating complex cognitive phenomena like blindsight."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c37e357-e5e6-40b2-8507-f83161f5d85f",
+   "metadata": {
+    "colab_type": "text",
+    "execution": {}
+   },
+   "source": [
+    "```python\n",
+    "class SecondOrderNetwork(nn.Module):\n",
+    "    def __init__(self, use_gelu):\n",
+    "        super(SecondOrderNetwork, self).__init__()\n",
+    "        # Define a linear layer for comparing the difference between input and output of the first-order network\n",
+    "        self.comparison_layer = nn.Linear(100, 100)\n",
+    "\n",
+    "        # Linear layer for determining wagers, mapping from 100 features to a single output\n",
+    "        self.wager = nn.Linear(100, 1)\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting by randomly setting input units to 0 with a probability of 0.5 during training\n",
+    "        self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "        # Select activation function based on the `use_gelu` flag\n",
+    "        self.activation = torch.relu\n",
+    "\n",
+    "        # Additional activation functions for potential use in network operations\n",
+    "        self.sigmoid = torch.sigmoid\n",
+    "\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize the weights of the network\n",
+    "        self._init_weights()\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def forward(self, first_order_input, first_order_output):\n",
+    "        ############################################################\n",
+    "        # Fill in the wager value\n",
+    "        # Applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        raise NotImplementedError(\"Student exercise\")\n",
+    "        ############################################################\n",
+    "\n",
+    "        # Calculate the difference between the first-order input and output\n",
+    "        comparison_matrix = first_order_input - first_order_output\n",
+    "\n",
+    "        #Another option is to directly calculate the per unit MSE to use as input for the comparator matrix\n",
+    "        #comparison_matrix = nn.MSELoss(reduction='none')(first_order_output, first_order_input)\n",
+    "\n",
+    "        # Pass the difference through the comparison layer and apply the chosen activation function\n",
+    "        comparison_out=self.dropout(self.activation(self.comparison_layer(comparison_matrix)))\n",
+    "\n",
+    "        # Calculate the wager value, applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        wager = ...\n",
+    "\n",
+    "        return wager\n",
+    "\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d931cb5-a87a-48be-8760-79512b9d88f7",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# to_remove solution\n",
+    "class SecondOrderNetwork(nn.Module):\n",
+    "    def __init__(self, use_gelu):\n",
+    "        super(SecondOrderNetwork, self).__init__()\n",
+    "        # Define a linear layer for comparing the difference between input and output of the first-order network\n",
+    "        self.comparison_layer = nn.Linear(100, 100)\n",
+    "\n",
+    "        # Linear layer for determining wagers, mapping from 100 features to a single output\n",
+    "        self.wager = nn.Linear(100, 1)\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting by randomly setting input units to 0 with a probability of 0.5 during training\n",
+    "        self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "        # Select activation function based on the `use_gelu` flag\n",
+    "        self.activation = torch.relu\n",
+    "\n",
+    "        # Additional activation functions for potential use in network operations\n",
+    "        self.sigmoid = torch.sigmoid\n",
+    "\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize the weights of the network\n",
+    "        self._init_weights()\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def forward(self, first_order_input, first_order_output):\n",
+    "        # Calculate the difference between the first-order input and output\n",
+    "        comparison_matrix = first_order_input - first_order_output\n",
+    "\n",
+    "        #Another option is to directly calculate the per unit MSE to use as input for the comparator matrix\n",
+    "        #comparison_matrix = nn.MSELoss(reduction='none')(first_order_output, first_order_input)\n",
+    "\n",
+    "        # Pass the difference through the comparison layer and apply the chosen activation function\n",
+    "        comparison_out=self.dropout(self.activation(self.comparison_layer(comparison_matrix)))\n",
+    "\n",
+    "        # Calculate the wager value, applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        wager = self.sigmoid(self.wager(comparison_out))\n",
+    "\n",
+    "        return wager"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "736319ec-2a17-4d80-bb04-b9507ba5db5d",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Second_Order_Network\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "947c8550-a40d-43aa-bfd6-1eb8cead339f",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# First order network instantiation\n",
+    "first_order_network = FirstOrderNetwork(hidden, factor, gelu).to(device)\n",
+    "\n",
+    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "seeds=15\n",
+    "\n",
+    "results_seed=[]\n",
+    "discrimination_seed=[]\n",
+    "\n",
+    "# Hyperparameters\n",
+    "optimizer=\"ADAMAX\"\n",
+    "hidden=40\n",
+    "factor=2\n",
+    "gelu=False\n",
+    "gam=0.98\n",
+    "meta=True\n",
+    "stepsize=25\n",
+    "\n",
+    "for i in range(seeds):\n",
+    "  print(f\"Seed {i}\")\n",
+    "\n",
+    "  # Compare your results with the patterns generate below\n",
+    "  initialize_global()\n",
+    "\n",
+    "  # Prepare networks, loss functions, optimizers, and schedulers for pre-training\n",
+    "  first_order_network, second_order_network, criterion_1, criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2 = prepare_pre_training(hidden, factor, gelu, stepsize, gam)\n",
+    "\n",
+    "  # Conduct pre-training for both the first-order and second-order networks\n",
+    "  first_order_network_pre, second_order_network_pre, epoch_1_order, epoch_2_order , max_value_indices = pre_train(first_order_network, second_order_network, criterion_1,  criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2, factor, meta)\n",
+    "\n",
+    "  # Plot the training progress of both networks to visualize performance and learning trends\n",
+    "  pre_train_plots(epoch_1_order, epoch_2_order, f\"1st & 2nd Order Networks - Seed {i}\" , max_value_indices )\n",
+    "\n",
+    "  # Configuration step for the main training phase or evaluation\n",
+    "  testing_patterns, n_samples = get_test_patterns(factor)\n",
+    "\n",
+    "  # Function to test the model using the configured testing patterns\n",
+    "  first_order_network_pre.eval()\n",
+    "  second_order_network_pre.eval()\n",
+    "  f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, first_order_network_pre, second_order_network_pre,factor)\n",
+    "  results_seed.append(results_for_plotting)\n",
+    "  discrimination_seed.append(discrimination_performances)\n",
+    "\n",
+    "plot_testing(results_seed, discrimination_seed, seeds, \"Test Results\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2047ee8a-4ebc-41dc-a77a-4e17f7c74947",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Discussion point\n",
+    "\n",
+    "Let's dive into the outcomes!\n",
+    "\n",
+    "- Did you notice any variations between the two models?\n",
+    "- Can you explain how these differences influenced the performance?\n",
+    "- What role does a second-order network play, and in which situations would it be more effective?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "55115815-beb2-4f19-a598-9b129ff87637",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Discussion_Point_Second_Order_Network\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a5a880a9-a069-4e0f-a481-f3b85b6a3952",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Video 1: Second Order Network\n",
+    "\n",
+    "from ipywidgets import widgets\n",
+    "from IPython.display import YouTubeVideo\n",
+    "from IPython.display import IFrame\n",
+    "from IPython.display import display\n",
+    "\n",
+    "class PlayVideo(IFrame):\n",
+    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
+    "    self.id = id\n",
+    "    if source == 'Bilibili':\n",
+    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
+    "    elif source == 'Osf':\n",
+    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
+    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
+    "\n",
+    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
+    "  tab_contents = []\n",
+    "  for i, video_id in enumerate(video_ids):\n",
+    "    out = widgets.Output()\n",
+    "    with out:\n",
+    "      if video_ids[i][0] == 'Youtube':\n",
+    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
+    "                             height=H, fs=fs, rel=0)\n",
+    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
+    "      else:\n",
+    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
+    "                          height=H, fs=fs, autoplay=False)\n",
+    "        if video_ids[i][0] == 'Bilibili':\n",
+    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
+    "        elif video_ids[i][0] == 'Osf':\n",
+    "          print(f'Video available at https://osf.io/{video.id}')\n",
+    "      display(video)\n",
+    "    tab_contents.append(out)\n",
+    "  return tab_contents\n",
+    "\n",
+    "video_ids = [('Youtube', 'lHRP14mxXv8'), ('Bilibili', 'BV1jM4m1S7ek')]\n",
+    "tab_contents = display_videos(video_ids, W=854, H=480)\n",
+    "tabs = widgets.Tab()\n",
+    "tabs.children = tab_contents\n",
+    "for i in range(len(tab_contents)):\n",
+    "  tabs.set_title(i, video_ids[i][0])\n",
+    "display(tabs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8a54d67b-507e-4a8a-9715-0aacdeb06f26",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Video_1\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a694f1e-3f32-48fc-bce8-0b544d43ca62",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Coding Exercise 3: Plot Surfaces for Content / Awareness Inference\n",
+    "\n",
+    "To explore the properties of the HOSS model, we can simulate inference at different levels of the hierarchy over the full 2D space of possible input X's. The left panel below shows that the probability of awareness (of any stimulus contents) rises in a graded manner from the lower left corner of the graph (low activation of any feature) to the upper right (high activation of both features). In contrast, the right panel shows that confidence in making a discrimination response (e.g. rightward vs. leftward) increases away from the major diagonal, as the model becomes sure that the sample was generated by either a leftward or rightward tilted stimulus.\n",
+    "\n",
+    "Together, the two surfaces make predictions about the relationships we might see between discrimination confidence and awareness in a simple psychophysics experiment. One notable prediction is that discrimination could still be possible - and lead to some degree of confidence - even when the higher-order node is \"reporting\" unawareness of the stimulus.\n",
+    "\n",
+    "Now, let's get hands on and plot those auto-generated patterns!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "77fbfe70",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "def HOSS_evaluate(X, mu, Sigma, Aprior, Wprior):\n",
+    "    \"\"\"\n",
+    "    Inference on 2D Bayes net for asymmetric inference on presence vs. absence.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Initialise variables and conditional prob tables\n",
+    "    p_A = np.array([1 - Aprior, Aprior])  # prior on awareness state A\n",
+    "    p_W_a1 = np.append(0, Wprior)  # likelihood of world states W given aware, first entry is absence\n",
+    "    p_W_a0 = np.append(1, np.zeros(len(Wprior)))  # likelihood of world states W given unaware, first entry is absence\n",
+    "    p_W = (p_W_a1 + p_W_a0) / 2  # prior on W marginalising over A (for KL)\n",
+    "\n",
+    "    # Compute likelihood of observed X for each possible W (P(X|mu_w, Sigma))\n",
+    "    lik_X_W = np.array([multivariate_normal.pdf(X, mean=mu_i, cov=Sigma) for mu_i in mu])\n",
+    "    p_X_W = lik_X_W / lik_X_W.sum()  # normalise to get P(X|W)\n",
+    "\n",
+    "    # Combine with likelihood of each world state w given awareness state A\n",
+    "    lik_W_A = np.vstack((p_X_W * p_W_a0 * p_A[0], p_X_W * p_W_a1 * p_A[1]))\n",
+    "    post_A = lik_W_A.sum(axis=1)  # sum over W\n",
+    "    post_A = post_A / post_A.sum()  # normalise\n",
+    "\n",
+    "    # Posterior over W (P(W|X=x) marginalising over A)\n",
+    "    post_W = lik_W_A.sum(axis=0)  # sum over A\n",
+    "    post_W = post_W / post_W.sum()  # normalise\n",
+    "\n",
+    "    # KL divergences\n",
+    "    KL_W = (post_W * np.log(post_W / p_W)).sum()\n",
+    "    KL_A = (post_A * np.log(post_A / p_A)).sum()\n",
+    "\n",
+    "    return post_W, post_A, KL_W, KL_A"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "31503073-a7c0-4502-8d94-5ffa47a22926",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the grid\n",
+    "xgrid = np.arange(0, 2.01, 0.01)\n",
+    "\n",
+    "# Define the means for the Gaussian distributions\n",
+    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
+    "\n",
+    "# Define the covariance matrix\n",
+    "Sigma = np.array([[1, 0], [0, 1]])\n",
+    "\n",
+    "# Prior probabilities\n",
+    "Wprior = np.array([0.5, 0.5])\n",
+    "Aprior = 0.5\n",
+    "\n",
+    "# Initialize arrays to hold confidence and posterior probability\n",
+    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
+    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
+    "\n",
+    "# Compute confidence and posterior probability for each point in the grid\n",
+    "for i, xi in tqdm(enumerate(xgrid), total=len(xgrid), desc='Outer Loop'):\n",
+    "    for j, xj in enumerate(xgrid):\n",
+    "        X = [xi, xj]\n",
+    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
+    "        confW[i, j] = max(post_w[1], post_w[2])\n",
+    "        posteriorAware[i, j] = post_A[1]\n",
+    "\n",
+    "with plt.xkcd():\n",
+    "\n",
+    "    # Plotting\n",
+    "    plt.figure(figsize=(10, 5))\n",
+    "\n",
+    "    # Posterior probability \"seen\"\n",
+    "    plt.subplot(1, 2, 1)\n",
+    "    plt.contourf(xgrid, xgrid, posteriorAware.T)\n",
+    "    plt.colorbar()\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('Posterior probability \"seen\"')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    # Confidence in identity\n",
+    "    plt.subplot(1, 2, 2)\n",
+    "    contour_set = plt.contourf(xgrid, xgrid, confW.T)\n",
+    "    plt.colorbar()\n",
+    "    plt.contour(xgrid, xgrid, posteriorAware.T, levels=[0.5], linewidths=4, colors=['white'])  # Line contour for threshold\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('Confidence in identity')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d129657-62aa-42d1-970a-93fd67736b69",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Simulate KL-divergence surfaces\n",
+    "\n",
+    "We can also simulate KL-divergences (a measure of Bayesian surprise) at each layer in the network, which under predictive coding models of the brain, has been proposed to scale with neural activation (e.g., Friston, 2005; Summerfield & de Lange, 2014)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "66044263-c8de-49a9-a56b-2e7336cc737c",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the grid\n",
+    "xgrid = np.arange(0, 2.01, 0.01)\n",
+    "\n",
+    "# Define the means for the Gaussian distributions\n",
+    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
+    "\n",
+    "# Define the covariance matrix\n",
+    "Sigma = np.array([[1, 0], [0, 1]])\n",
+    "\n",
+    "# Prior probabilities\n",
+    "Wprior = np.array([0.5, 0.5])\n",
+    "Aprior = 0.5\n",
+    "\n",
+    "# Initialize arrays to hold confidence and posterior probability\n",
+    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
+    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
+    "\n",
+    "# Compute confidence and posterior probability for each point in the grid\n",
+    "for i, xi in enumerate(xgrid):\n",
+    "    for j, xj in enumerate(xgrid):\n",
+    "        X = [xi, xj]\n",
+    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
+    "\n",
+    "        confW[i, j] = max(post_w[1], post_w[2])\n",
+    "        posteriorAware[i, j] = post_A[1]\n",
+    "\n",
+    "# Calculate the mean K-L divergence for absent and present awareness states\n",
+    "KL_A_absent = np.mean(KL_A[posteriorAware < 0.5])\n",
+    "KL_A_present = np.mean(KL_A[posteriorAware >= 0.5])\n",
+    "KL_w_absent = np.mean(KL_w[posteriorAware < 0.5])\n",
+    "KL_w_present = np.mean(KL_w[posteriorAware >= 0.5])\n",
+    "\n",
+    "with plt.xkcd():\n",
+    "\n",
+    "    # Plotting\n",
+    "    plt.figure(figsize=(18, 6))\n",
+    "\n",
+    "    # K-L divergence, perceptual states\n",
+    "    plt.subplot(1, 2, 1)\n",
+    "    plt.contourf(xgrid, xgrid, KL_w.T, cmap='viridis')\n",
+    "    plt.colorbar()\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('KL-divergence, perceptual states')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    # K-L divergence, awareness state\n",
+    "    plt.subplot(1, 2, 2)\n",
+    "    plt.contourf(xgrid, xgrid, KL_A.T, cmap='viridis')\n",
+    "    plt.colorbar()\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('KL-divergence, awareness state')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b32e4908-0f6f-4259-832f-045adcb19700",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Discussion point\n",
+    "\n",
+    "Can you recognise the difference between the KL divergence for the W-level and the one for the A-level?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7d8deb66-9a1d-49e1-a3ef-96970efa8d97",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# to_remove explanation\n",
+    "\"\"\"\n",
+    "At the level of perceptual states W, there is a substantial asymmetry in the KL-divergence expected when the\n",
+    "model says ‘seen’ vs. ‘unseen’ (lefthand panel). This is due to the large belief updates invoked in the\n",
+    "perceptual layer W by samples that deviate from the lower lefthand corner - from absence. In contrast, when\n",
+    "we compute KL-divergence for the A-level (righthand panel), the level of prediction error is symmetric across\n",
+    "seen and unseen decisions, leading to \"hot\" zones both at the upper righthand (present) and lower lefthand\n",
+    "(absent) corners of the 2D space.\n",
+    "\n",
+    "Intuitively, this means that at the W-level, there's a noticeable difference in the KL-divergence values\n",
+    "between \"seen\" and \"unseen\" predictions. This large difference is mainly due to significant updates in the\n",
+    "model's beliefs at this level when the detected samples are far from what is expected under the condition of\n",
+    "\"absence.\" However, when we analyze the K-L divergence at the A-level, the discrepancies in prediction errors\n",
+    "between \"seen\" and \"unseen\" are balanced. This creates equally strong responses in the model, whether something\n",
+    "is detected or not detected.\n",
+    "\n",
+    "We can also sort the KL-divergences as a function of whether the model \"reported\" presence or absence. As\n",
+    "can be seen in the bar plots below, there is more asymmetry in the prediction error at the W compared to the\n",
+    "A levels.\n",
+    "\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "869fc8f1-4199-4525-80b3-26e74babc66a",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "with plt.xkcd():\n",
+    "\n",
+    "    # Create figure with specified size\n",
+    "    plt.figure(figsize=(10, 5))\n",
+    "\n",
+    "    # KL divergence for W states\n",
+    "    plt.subplot(1, 2, 1)\n",
+    "    plt.bar(['unseen', 'seen'], [KL_w_absent, KL_w_present], color='k')\n",
+    "    plt.ylabel('KL divergence, W states')\n",
+    "    plt.xticks(fontsize=18)\n",
+    "    plt.yticks(fontsize=18)\n",
+    "\n",
+    "    # KL divergence for A states\n",
+    "    plt.subplot(1, 2, 2)\n",
+    "    plt.bar(['unseen', 'seen'], [KL_A_absent, KL_A_present], color='k')\n",
+    "    plt.ylabel('KL divergence, A states')\n",
+    "    plt.xticks(fontsize=18)\n",
+    "    plt.yticks(fontsize=18)\n",
+    "\n",
+    "    plt.tight_layout()\n",
+    "\n",
+    "    # Show plot\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "64ecb92c-bfe3-4e49-bd40-f11ffa685ece",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_HOSS_Bonus_Content\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcd87344-d473-44af-a881-b68e5471d353",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "---\n",
+    "# Discussion\n",
+    "This section contains an extra discussion exercise if you have time and inclination."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca33829c-8d54-437e-ba33-d3003af51d7a",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "In this bonus section, Megan and Anil will delve into the complexities of defining and testing for consciousness, particularly in the context of artificial intelligence. We will explore various theoretical perspectives, examine classic and contemporary tests for consciousness, and discuss the challenges and ethical implications of determining whether a system truly possesses conscious experience."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "21b621ea-1639-4131-8ec3-9cdf34a64f77",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Video 2: Consciousness Bonus Content\n",
+    "\n",
+    "from ipywidgets import widgets\n",
+    "from IPython.display import YouTubeVideo\n",
+    "from IPython.display import IFrame\n",
+    "from IPython.display import display\n",
+    "\n",
+    "class PlayVideo(IFrame):\n",
+    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
+    "    self.id = id\n",
+    "    if source == 'Bilibili':\n",
+    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
+    "    elif source == 'Osf':\n",
+    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
+    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
+    "\n",
+    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
+    "  tab_contents = []\n",
+    "  for i, video_id in enumerate(video_ids):\n",
+    "    out = widgets.Output()\n",
+    "    with out:\n",
+    "      if video_ids[i][0] == 'Youtube':\n",
+    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
+    "                             height=H, fs=fs, rel=0)\n",
+    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
+    "      else:\n",
+    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
+    "                          height=H, fs=fs, autoplay=False)\n",
+    "        if video_ids[i][0] == 'Bilibili':\n",
+    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
+    "        elif video_ids[i][0] == 'Osf':\n",
+    "          print(f'Video available at https://osf.io/{video.id}')\n",
+    "      display(video)\n",
+    "    tab_contents.append(out)\n",
+    "  return tab_contents\n",
+    "\n",
+    "video_ids = [('Youtube', '00dL8q7WgcU'), ('Bilibili', 'BV12n4y1Q7C2')]\n",
+    "tab_contents = display_videos(video_ids, W=854, H=480)\n",
+    "tabs = widgets.Tab()\n",
+    "tabs.children = tab_contents\n",
+    "for i in range(len(tab_contents)):\n",
+    "  tabs.set_title(i, video_ids[i][0])\n",
+    "display(tabs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "39c202fb-f580-4a96-8f8e-bad24ed1d55c",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Video_2\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9e839f2-e237-4ed4-9045-56dc7b5f6d60",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "## Discussion activity: Is it actually conscious?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2720c0b5-6386-43a6-9647-f1245531c376",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "We discussed the difference between these two...\n",
+    "- \"Forward\" tests: passing means the machine is conscious (or intelligent).\n",
+    "- \"Reverse\" tests: passing means humans are convinced that a machine is conscious (or intelligent).\n",
+    "\n",
+    "**Discuss!** If a system (AI, other animal, other human) exhibited all the \"right signs\" of being conscious, how can we know for sure it is actually conscious? How could you design a test to be a true forward test?\n",
+    "\n",
+    "- Room 1: I think you could design a forward test in this way... [share your ideas]\n",
+    "- Room 2: I think a forward test is impossible, and here's why [share your ideas]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "84958157-c165-4cc3-be76-408999cf44ad",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Discussion_activity\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "collapsed_sections": [],
+   "include_colab_link": true,
+   "name": "W2D5_Tutorial3",
+   "provenance": [],
+   "toc_visible": true
+  },
+  "kernel": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.22"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial1_Solution_96fe639d.py b/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial1_Solution_96fe639d.py
deleted file mode 100644
index 862854695..000000000
--- a/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial1_Solution_96fe639d.py
+++ /dev/null
@@ -1,101 +0,0 @@
-
-# Experiment parameters
-mu = np.array([[0.5, 0.5], [3.5, 0.5], [0.5, 3.5]])
-Nsubjects = 30
-Ntrials = 600
-cond = np.concatenate((np.ones(Ntrials//3), np.ones(Ntrials//3)*2, np.ones(Ntrials//3)*3))
-Wprior = [0.5, 0.5]
-Aprior = 0.5
-
-# Sensory precision values
-gamma = np.linspace(0.1, 10, 6)
-
-# Initialize lists for results
-all_KL_w_yes = []
-sem_KL_w_yes = []
-all_KL_w_no = []
-sem_KL_w_no = []
-all_KL_A_yes = []
-sem_KL_A_yes = []
-all_KL_A_no = []
-sem_KL_A_no = []
-all_prob_y = []
-
-for y in tqdm(gamma, desc='Processing gammas'):
-    Sigma = np.diag([1./np.sqrt(y)]*2)
-    mean_KL_w = np.zeros((Nsubjects, 4))
-    mean_KL_A = np.zeros((Nsubjects, 4))
-    prob_y = np.zeros(Nsubjects)
-
-    for s in tqdm(range(Nsubjects), desc=f'Subjects for gamma={y}', leave=False):
-        KL_w = np.zeros(len(cond))
-        KL_A = np.zeros(len(cond))
-        posteriorAware = np.zeros(len(cond))
-
-        # Generate sensory samples
-        X = np.array([multivariate_normal.rvs(mean=mu[int(c)-1, :], cov=Sigma) for c in cond])
-
-        # Model inversion for each trial
-        for i, x in enumerate(X):
-            post_w, post_A, KL_w[i], KL_A[i] = HOSS_evaluate(x, mu, Sigma, Aprior, Wprior)
-            posteriorAware[i] = post_A[1]  # Assuming post_A is a tuple with awareness probability at index 1
-
-        binaryAware = posteriorAware > 0.5
-        for i in range(4):
-            conditions = [(cond == 1), (cond != 1), (cond == 1), (cond != 1)]
-            aware_conditions = [(binaryAware == 0), (binaryAware == 0), (binaryAware == 1), (binaryAware == 1)]
-            mean_KL_w[s, i] = np.mean(KL_w[np.logical_and(aware_conditions[i], conditions[i])])
-            mean_KL_A[s, i] = np.mean(KL_A[np.logical_and(aware_conditions[i], conditions[i])])
-
-        prob_y[s] = np.mean(binaryAware[cond != 1])
-
-    # Aggregate results across subjects
-    all_KL_w_yes.append(np.nanmean(mean_KL_w[:, 2:4].flatten()))
-    sem_KL_w_yes.append(np.nanstd(mean_KL_w[:, 2:4].flatten()) / np.sqrt(Nsubjects))
-    all_KL_w_no.append(np.nanmean(mean_KL_w[:, :2].flatten()))
-    sem_KL_w_no.append(np.nanstd(mean_KL_w[:, :2].flatten()) / np.sqrt(Nsubjects))
-    all_KL_A_yes.append(np.nanmean(mean_KL_A[:, 2:4].flatten()))
-    sem_KL_A_yes.append(np.nanstd(mean_KL_A[:, 2:4].flatten()) / np.sqrt(Nsubjects))
-    all_KL_A_no.append(np.nanmean(mean_KL_A[:, :2].flatten()))
-    sem_KL_A_no.append(np.nanstd(mean_KL_A[:, :2].flatten()) / np.sqrt(Nsubjects))
-    all_prob_y.append(np.nanmean(prob_y))
-
-with plt.xkcd():
-
-    # Create figure
-    plt.figure(figsize=(10, 5))
-
-    # First subplot: Probability of reporting "seen" for w_1 or w_2
-    plt.subplot(1, 3, 1)
-    plt.plot(gamma, all_prob_y, linewidth=2)
-    plt.xlabel('Stimulus strength')
-    plt.ylabel('Prob. report "seen" for w_1 or w_2')
-    plt.xticks(fontsize=14)
-    plt.yticks(fontsize=14)
-    plt.box(False)
-
-    # Second subplot: K-L divergence, perceptual states
-    plt.subplot(1, 3, 2)
-    plt.errorbar(gamma, all_KL_w_yes, yerr=sem_KL_w_yes, linewidth=2, label='Seen')
-    plt.errorbar(gamma, all_KL_w_no, yerr=sem_KL_w_no, linewidth=2, label='Unseen')
-    plt.legend(frameon=False)
-    plt.xlabel('Stimulus strength')
-    plt.ylabel('KL-divergence, perceptual states')
-    plt.xticks(fontsize=14)
-    plt.yticks(fontsize=14)
-    plt.box(False)
-
-    # Third subplot: K-L divergence, awareness state
-    plt.subplot(1, 3, 3)
-    plt.errorbar(gamma, all_KL_A_yes, yerr=sem_KL_A_yes, linewidth=2, label='Seen')
-    plt.errorbar(gamma, all_KL_A_no, yerr=sem_KL_A_no, linewidth=2, label='Unseen')
-    plt.legend(frameon=False)
-    plt.xlabel('Stimulus strength')
-    plt.ylabel('KL-divergence, awareness state')
-    plt.xticks(fontsize=14)
-    plt.yticks(fontsize=14)
-    plt.box(False)
-
-    # Adjust layout and display the figure
-    plt.tight_layout()
-    plt.show()
\ No newline at end of file
diff --git a/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial3_Solution_a926812a.py b/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial3_Solution_a926812a.py
new file mode 100644
index 000000000..d3042d5f5
--- /dev/null
+++ b/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial3_Solution_a926812a.py
@@ -0,0 +1,42 @@
+class SecondOrderNetwork(nn.Module):
+    def __init__(self, use_gelu):
+        super(SecondOrderNetwork, self).__init__()
+        # Define a linear layer for comparing the difference between input and output of the first-order network
+        self.comparison_layer = nn.Linear(100, 100)
+
+        # Linear layer for determining wagers, mapping from 100 features to a single output
+        self.wager = nn.Linear(100, 1)
+
+        # Dropout layer to prevent overfitting by randomly setting input units to 0 with a probability of 0.5 during training
+        self.dropout = nn.Dropout(0.5)
+
+        # Select activation function based on the `use_gelu` flag
+        self.activation = torch.relu
+
+        # Additional activation functions for potential use in network operations
+        self.sigmoid = torch.sigmoid
+
+        self.softmax = nn.Softmax()
+
+        # Initialize the weights of the network
+        self._init_weights()
+
+    def _init_weights(self):
+        # Uniformly initialize weights for the comparison and wager layers
+        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)
+        init.uniform_(self.wager.weight, 0.0, 0.1)
+
+    def forward(self, first_order_input, first_order_output):
+        # Calculate the difference between the first-order input and output
+        comparison_matrix = first_order_input - first_order_output
+
+        #Another option is to directly calculate the per unit MSE to use as input for the comparator matrix
+        #comparison_matrix = nn.MSELoss(reduction='none')(first_order_output, first_order_input)
+
+        # Pass the difference through the comparison layer and apply the chosen activation function
+        comparison_out=self.dropout(self.activation(self.comparison_layer(comparison_matrix)))
+
+        # Calculate the wager value, applying dropout and sigmoid activation to the output of the wager layer
+        wager = self.sigmoid(self.wager(comparison_out))
+
+        return wager
\ No newline at end of file
diff --git a/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial1_Solution_f903bbb4.py b/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial3_Solution_f903bbb4.py
similarity index 100%
rename from tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial1_Solution_f903bbb4.py
rename to tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial3_Solution_f903bbb4.py
diff --git a/tutorials/W2D5_Mysteries/static/W2D5_Tutorial1_Solution_96fe639d_585.png b/tutorials/W2D5_Mysteries/static/W2D5_Tutorial1_Solution_96fe639d_585.png
deleted file mode 100644
index 70b65c6d2..000000000
Binary files a/tutorials/W2D5_Mysteries/static/W2D5_Tutorial1_Solution_96fe639d_585.png and /dev/null differ
diff --git a/tutorials/W2D5_Mysteries/student/W2D5_Intro.ipynb b/tutorials/W2D5_Mysteries/student/W2D5_Intro.ipynb
index bc018d3c2..e7bea670c 100644
--- a/tutorials/W2D5_Mysteries/student/W2D5_Intro.ipynb
+++ b/tutorials/W2D5_Mysteries/student/W2D5_Intro.ipynb
@@ -59,6 +59,19 @@
     "feedback_prefix = \"W2D5_Intro\""
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "# Mysteries \n",
+    "\n",
+    "Welcome to the final day of the NeuroAI course! You've covered a wide range of topics and we hope you have enjoyed the content we've put together and that you've put your mind to work in absorbing some of the low-level as well as the high-level details of some of this - at time - tricky and mathematically detailed content. As you can tell from the title of this final day, we're switching to a different type of educational content. We're leaving you with some of the open mysteries in the field and talking you through some of the on-going work aimed at finding solutions. \n",
+    "\n",
+    "We hope with the tools we've equipped you with, you might be inspired by some of the active mysteries and perhaps your name(s) will be on papers in the future that aim to provide some solid work that goes further to help understand the underlying mechanisms behind some of these super interesting ideas."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -214,7 +227,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.22"
   }
  },
  "nbformat": 4,
diff --git a/tutorials/W2D5_Mysteries/student/W2D5_Tutorial1.ipynb b/tutorials/W2D5_Mysteries/student/W2D5_Tutorial1.ipynb
index e9e236d98..96aa0c2e8 100644
--- a/tutorials/W2D5_Mysteries/student/W2D5_Tutorial1.ipynb
+++ b/tutorials/W2D5_Mysteries/student/W2D5_Tutorial1.ipynb
@@ -25,9 +25,9 @@
     "\n",
     "__Content creators:__ Steve Fleming, Guillaume Dumas, Samuele Bolotta, Juan David Vargas, Hakwan Lau, Anil Seth, Megan Peters\n",
     "\n",
-    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Patrick Mineault\n",
+    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Patrick Mineault, Alex Murphy\n",
     "\n",
-    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Patrick Mineault\n"
+    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Patrick Mineault, Alex Murphy\n"
    ]
   },
   {
@@ -50,7 +50,9 @@
     "\n",
     "2. Explore core frameworks for analyzing consciousness, including diagnostic criteria, and will compare objective probabilities with subjective credences.\n",
     "\n",
-    "3. Explore reductionist theories of consciousness, such as Global Workspace Theory (GWT), theories of metacognition, and Higher-Order Thought (HOT) theories.\n"
+    "3. Explore reductionist theories of consciousness, such as Global Workspace Theory (GWT), theories of metacognition, and Higher-Order Thought (HOT) theories.\n",
+    "\n",
+    "The topic of consciousness and what it means to be *conscious* is a long-standing open question in neuroscience and recently has drawn a lot of attention in machine learning in the context of large language models and foundation models. People have claimed that these models exhibits sparks of consciousness and a strong debate in the community continues to rage on. It's therefore likely a big issue that will continue to gain a lot of traction in the space of NeuroAI and we hope you can start to build some familiarity with the tools use to quantify and study this fascinating topic. \n"
    ]
   },
   {
@@ -450,6 +452,7 @@
     "\n",
     "        # Close the figure to free up memory\n",
     "        plt.close(fig)\n",
+    "\n",
     "# Function to configure the training environment and load the models\n",
     "def get_test_patterns(factor):\n",
     "    \"\"\"\n",
@@ -532,7 +535,7 @@
     "            discrimination_performances.append(discrimination_performance)\n",
     "\n",
     "\n",
-    "            chance_level = torch.Tensor( generate_chance_level((200*factor,100)))\n",
+    "            chance_level = torch.Tensor( generate_chance_level((200*factor,100))).to(device)\n",
     "            discrimination_random= round((chance_level[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean().item(), 2)\n",
     "            print(\"chance level\" , discrimination_random)\n",
     "\n",
@@ -1024,7 +1027,9 @@
     "              \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n",
     "              \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n",
     "\n",
-    "    return device"
+    "    return device\n",
+    "\n",
+    "device = set_device()"
    ]
   },
   {
@@ -1275,9 +1280,24 @@
    "source": [
     "In this section, we are exploring an important concept in machine learning: the idea that the complexity we observe in the physical world often arises from simpler, independently functioning parts. Think of the world as being made up of different modules or units that usually operate on their own but sometimes interact with each other. This is similar to how different apps on your phone work independently but can share information when needed.\n",
     "\n",
+    "---\n",
+    "\n",
+    "### Modularity Recap\n",
+    "Remember in W2D1, our day entitled **Macrocircuits**? In Tutorial 3 of that day, the focus was on neural network modularity and we showed you that, compared to a single holistic architecture, having separable modular approaches, each with their own inductive biases, provided a much more efficient mechanism to model complex data. Not only that, but these sub-modules had stronger inductive biases and were easily generalizable to novel inputs. Today, we're also shining a spotlight on the similar idea, but from a much more integrative perspective applied to the grand idea of modeling consciousness. Those of you who are interested should review this tutorial and the ideas on modularity and how it can support complex systems more efficiently than holistic unitary mechanisms.\n",
+    "\n",
+    "---"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fb33b12",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
     "This idea is closely linked to the field of causal inference, which studies how these separate units or mechanisms cause and influence each other. The goal is to understand and model how these mechanisms work both individually and together. Importantly, these mechanisms often interact only minimally, which means they can keep working properly even if changes occur in other parts. This characteristic makes them very robust, or capable of handling disturbances well.\n",
     "\n",
-    "A specific example from machine learning that uses this idea is called Recurrent Independent Mechanisms (RIMs). In RIMs, different parts of the model mostly work independently, but they can also communicate or \"pay attention\" to each other when it’s necessary. This setup allows for efficient and dynamic processing of information. The research paper available here (https://arxiv.org/pdf/1909.10893) discusses this approach in detail. It highlights the benefits of designing models that recognize and utilize the independence and occasional interactions of these mechanisms. Such models are often more adaptable and can generalize better, meaning they perform well across a variety of different tasks or situations."
+    "A specific example from machine learning that uses this idea is called Recurrent Independent Mechanisms (RIMs). In RIMs, different parts of the model mostly work independently, but they can also communicate or \"pay attention\" to each other when it’s necessary. This setup allows for efficient and dynamic processing of information. The research paper available here (https://arxiv.org/pdf/1909.10893) discusses this approach in detail."
    ]
   },
   {
@@ -1287,7 +1307,7 @@
     "execution": {}
    },
    "source": [
-    "### RIMs\n",
+    "### Recurrent Independent Mechanisms (RIMs)\n",
     "\n",
     "RIM networks are a type of recurrent neural network that process temporal sequences. Inputs are processed one element at a time, the different units of the network process the inputs, a hidden state is updated and propagated through time. RIM networks can thus be used as a drop-in replacement for RNNs like LSTMs or GRUs. The key differences are that:\n",
     "\n",
@@ -1297,26 +1317,26 @@
     "\n",
     "**Selecting the input**\n",
     "\n",
-    "Each RIM unit gets activated and updated when the input is pertinent to it. Using key-value attention, the queries originate from the RIMs, while the keys and values are derived from the current input. The key-value attention mechanisms enable dynamic selection of which variable instance (i.e., which entity or object) will serve as input to each RIM mechanism:\n",
+    "Recall in W1D5 (Microcircuits) we had a tutorial on **Attention** (Tutorial 3), where we covered how modern Transformer-based neural networks implement attention via the Query matrix, the Key matrix and the Value matrix? If not, you might benefit from reviewing the tutorial videos from that day as these concepts are used in the RIM networks we will look at today. Each RIM unit is activated and updated when the input is attended using the attention mechanism. Using key-value attention (KV matrices), the queries (Q matrix) originate from the RIMs, while the keys and values are derived from the current input. In standard deep learning terminology, this is very closely related to the concept of **cross-attention**. The key-value attention mechanisms enable dynamic selection of which variable instance (i.e., which entity or object) will serve as input to each RIM mechanism:\n",
     "\n",
     "$$\n",
     "\\text{Attention}(Q, K, V) = \\text{softmax}\\left(\\frac{Q K^T}{\\sqrt{d}}\\right) V\n",
     "$$\n",
     "\n",
-    "Linear transformations are used to construct keys $K = XW^e $, values $ V = XW^v $ and queries $Q = h_t W^q_k$.\n",
+    "Linear transformations are used to construct keys $K = XW^k $, values $ V = XW^v $ and queries $Q = h_t W^q_i$.\n",
     "\n",
     "Here:\n",
     "\n",
-    "* $ W^v $ is a matrix mapping from an input element to the corresponding value vector for the weighted attention\n",
-    "* $ W^e $ is a weight matrix which maps the input to the keys.\n",
-    "* $ W^q_k $ is a per-RIM weight matrix which maps from the RIM’s hidden state to its queries.\n",
+    "* $ W^k $ is a weight matrix which maps the input to the keys (Key matrix)\n",
+    "* $ W^v $ is a matrix mapping from an input element to the corresponding value vector for the weighted attention (Value matrix)\n",
+    "* $ W^q_i $ is a per-RIM weight matrix which maps from the RIM’s hidden state to its queries (Query matrix)\n",
     "* $h_t$ is the hidden state for a RIM mechanism.\n",
     "\n",
     "\n",
     "$\\oplus$ refers to the row-level concatenation operator. The attention thus is:\n",
     "\n",
     "$$\n",
-    "A^{(\\text{in})}_k = \\text{softmax}\\left(\\frac{h_t W^q_k (XW^e)^T}{\\sqrt{d_e}}\\right) XW^v, \\text{ where } \\theta^{(\\text{in})}_k = (W^q_k, W^e, W^v)\n",
+    "A^{(\\text{in})}_i = \\text{softmax}\\left(\\frac{h_t W^q_i (XW^k)^T}{\\sqrt{d_e}}\\right) XW^v, \\text{ where } \\theta^{(\\text{in})}_i = (W^q_i, W^k, W^v)\n",
     "$$\n",
     "\n",
     "At each step, the top-k RIMs are selected based on their attention scores for the actual input. Essentially, the RIMs compete at each step to read from the input, and only the RIMs that prevail in this competition are allowed to read from the input and update their state."
@@ -1341,10 +1361,10 @@
    "source": [
     "This figure shows how RIMs work over two steps.\n",
     "\n",
-    "- Query generation: each RIM starts by creating a query. This query helps each RIM pull out the necessary information from the data it receives at that moment.\n",
-    "- Attention-based selection: on the right side of the figure, you can see that some RIMs are chosen to be active (colored in blue) and others stay inactive (colored in white). This selection is made using a special scoring system called attention, which picks RIMs based on how relevant they are to the current visual inputs.\n",
-    "- State transition for active RIMs: the RIMs that get activated update their internal states according to their specific rules, using the information they've gathered. The RIMs that aren’t activated don’t change and keep their previous states.\n",
-    "- Communication between RIMs: finally, the active RIMs share information with each other, but this communication is limited. They use a system similar to key-value pairing, which helps them share only the most important information needed for the next step."
+    "- **Query generation**: each RIM starts by creating a query. This query helps each RIM pull out the necessary information from the data it receives at that moment.\n",
+    "- **Attention-based selection**: on the right side of the figure, you can see that some RIMs are chosen to be active (colored in blue) and others stay inactive (colored in white). This selection is made using a special scoring system called attention, which picks RIMs based on how relevant they are to the current visual inputs.\n",
+    "- **State transition for active RIMs**: the RIMs that get activated update their internal states according to their specific rules, using the information they've gathered. The RIMs that aren’t activated don’t change and keep their previous states.\n",
+    "- **Communication between RIMs**: finally, the active RIMs share information with each other, but this communication is limited. They use a system similar to key-value pairing, which helps them share only the most important information needed for the next step."
    ]
   },
   {
@@ -1354,7 +1374,7 @@
     "execution": {}
    },
    "source": [
-    "To make this more concrete, consider the example of modeling the motion of two balls. We can think of each ball as an independent mechanism. Although both balls are affected by Earth’s gravity, and very slightly by each other, they generally move independently. They only interact significantly when they collide. This model captures the essence of independent mechanisms interacting sparsely, a key idea in developing more effective and generalizable AI systems.\n",
+    "To make this more concrete, consider the example of modeling the motion of two balls. We can think of each ball as an independent mechanism. Although both balls are affected by Earth’s gravity, and very slightly from each other, they generally move independently. The only interaction that is significant occurs when they collide. This model captures the essence of independent mechanisms interacting **sparsely**, a key idea in developing more effective and generalizable AI systems (see W1D5 - Tutorial 1 for the tutorial devoted entirely to sparsity and its benefits).\n",
     "\n",
     "Now, let's download the RIM model!"
    ]
@@ -1423,11 +1443,11 @@
     "\n",
     "This is the test setup:\n",
     "\n",
-    "1. Train on 14x14 images of MNIST digits\n",
+    "1. Train on `14x14` images of MNIST digits\n",
     "2. Test on:\n",
-    "    - 16x16 images (validation set 1)\n",
-    "    - 19x19 images (validation set 2)\n",
-    "    - 24x24 images (validation set 3)\n",
+    "    - `16x16` images (validation set 1)\n",
+    "    - `19x19` images (validation set 2)\n",
+    "    - `24x24` images (validation set 3)\n",
     "\n",
     "This approach helps to understand whether the model can still recognize the digits accurately even when they appear at different scales or resolutions than those on which it was originally trained. By testing the model on various image sizes, we can determine how flexible and effective the model is at dealing with variations in input data.\n",
     "\n",
@@ -1655,7 +1675,7 @@
     "execution": {}
    },
    "source": [
-    "The accuracy of the model on 16x16 images is fairly close to what was observed on smaller images, indicating that the increase in size to 16x16 does not significantly impact the model's ability to recognize the images. However, RIMs demonstrate generalization better, when working with the larger 19x19 and 24x24 images - compared to LSTMs."
+    "The accuracy of the model on `16x16` images is fairly close to what was observed on smaller images, indicating that the increase in size to `16x16` does not significantly impact the model's ability to recognize the images. However, RIMs demonstrate generalization better, when working with the larger `19x19` and `24x24` images - compared to LSTMs."
    ]
   },
   {
@@ -1811,9 +1831,9 @@
     "execution": {}
    },
    "source": [
-    "In this section, we explore a deep learning model based on Global Workspace Theory from cognitive neuroscience. You can read more about this model in the linked research paper here (https://arxiv.org/pdf/2103.01197.pdf). The core idea behind this model is the use of a \"shared global workspace\" which serves as a coordination platform for the various specialized modules within the network.\n",
+    "In this section, we explore a deep learning model based on Global Workspace Theory from cognitive neuroscience. You can read more about this model in the linked research paper here (https://arxiv.org/pdf/2103.01197.pdf). The core idea behind this model is the use of a *shared global workspace* which serves as a coordination platform for the various specialized modules within the network.\n",
     "\n",
-    "Essentially, the model incorporates multiple specialist modules, each focusing on different aspects of a problem. Unlike in the RIM mechanism, these modules do not communicate directly with each other, but rather interact through a central shared memory. Communication with the centeral shared memory is handled, once again, by an attention mechanism."
+    "Essentially, the model incorporates multiple specialist modules, each focusing on different aspects of a problem. Unlike in the RIM mechanism, these modules do not communicate *directly* with each other, but rather interact *indirectly* through a central shared memory. Communication with the centeral shared memory is handled, once again, by an attention mechanism."
    ]
   },
   {
@@ -1833,7 +1853,7 @@
     "execution": {}
    },
    "source": [
-    "By centralizing communication this way, the model mimics how a human brain might focus only on the most relevant information at any given time. It mimics a sort of \"cognitive economy,\" where only the most relevant data is processed and shared among modules, reducing redundancy and enhancing the overall performance of the system. Moreover, the theory embeds some of the assumptions of the Global Workspace Theory (GWT) of consciousness, which suggests that consciousness arises from the ability of various brain processes to access a shared information platform, the **Global Workspace**."
+    "By centralizing communication this way, the model mimics how a human brain might focus only on the most relevant information at any given time. It mimics a sort of \"cognitive economy,\" where only the most relevant data is processed and shared among modules, filtered through a bottleneck that forces the model to use a highly efficient, reducing-redundancy useful representation, which enhances the overall performance of the system. Moreover, the theory embeds some of the assumptions of the Global Workspace Theory (GWT) of consciousness, which suggests that consciousness arises from the ability of various brain processes to access a shared information platform, the **Global Workspace**."
    ]
   },
   {
@@ -1853,13 +1873,17 @@
     "execution": {}
    },
    "source": [
-    "In our model, the interaction among the modules (or specialists) and the shared workspace is managed by a key-query-value cross-attention mechanism. Here’s how it works:\n",
+    "In our model, the interaction among the modules (or specialists) and the shared workspace is managed by a QKV cross-attention mechanism, as explained above. \n",
+    "\n",
+    "Here’s how it works:\n",
+    "\n",
+    "- **Key**: Each specialist module generates a key which represents the type of information the module wants to share.\n",
+    "- **Query**: The workspace generates a query at each computational step. This query represents what the workspace needs to know next to facilitate the overall task.\n",
+    "- **Value**: Each specialist also prepares a value, which is the actual information it proposes to add to the workspace.\n",
     "\n",
-    "- Key: Each specialist module generates a key which represents the type of information the module wants to share.\n",
-    "- Query: The workspace generates a query at each computational step. This query represents what the workspace needs to know next to facilitate the overall task.\n",
-    "- Value: Each specialist also prepares a value, which is the actual information it proposes to add to the workspace.\n",
+    "Please refer to the textual explanation above where some of this is defined in a bit more detail if this is still unclear to you.\n",
     "\n",
-    "Fill in the code below to implement this mechanism."
+    "Your task is to fill in the code below to implement this mechanism."
    ]
   },
   {
@@ -1977,7 +2001,9 @@
     "execution": {}
    },
    "source": [
-    "After updating the shared workspace with the most critical signals, this information is then broadcast back to all specialists. Each specialist updates its state using this broadcast information, which can involve an attention mechanism for consolidation and an update function (like an LSTM or GRU step) based on the new combined state. Let's add this method!"
+    "After updating the shared workspace with the most critical signals, this information is then broadcast back to all specialists. Each specialist updates its state using this broadcast information, which can involve an attention mechanism for consolidation and an update function (like an LSTM or GRU step) based on the new combined state. \n",
+    "\n",
+    "Let's add this method!"
    ]
   },
   {
@@ -2152,9 +2178,7 @@
     "execution": {}
    },
    "source": [
-    "Blindsight is a neurological phenomenon where individuals with damage to their primary visual cortex can still respond to visual stimuli without consciously perceiving them.\n",
-    "\n",
-    "\n",
+    "Blindsight is a neurological phenomenon where individuals with damage to their primary visual cortex can still respond to visual stimuli without consciously perceiving them. Megan showed you a video of this from a real patient navigating a corridor and successfully avoiding objects that researchers had strategically placed in his way, which the patient navigated successfully.\n",
     "\n",
     "To study this, we use a simulated dataset that mimics the conditions of blindsight. This dataset contains 400 patterns, equally split between two types:\n",
     "\n",
@@ -2708,11 +2732,11 @@
     "execution": {}
    },
    "source": [
-    "In this section, we'll merge ideas from earlier discussions to present a fresh perspective on how conscious awareness might arise in neural systems. This view comes from higher-order theory, which suggests that consciousness stems from the ability to monitor basic, or first-order, information processing activities, instead of merely broadcasting information globally. This concept agrees with global workspace theories that emphasize the need for a comprehensive monitor that oversees various first-order processes. Moreover, it extends the ideas discussed previously about the role of a second-order network, which helps us understand phenomena like blindsight, where a person can respond to visual stimuli without consciously seeing them.\n",
+    "In this section, we'll merge ideas from earlier discussions to present a fresh perspective on how conscious awareness might arise in neural systems. This view comes from higher-order theory, which suggests that consciousness stems from the ability to monitor basic, or first-order, information processing activities, instead of merely broadcasting information globally. This concept emphasizes the need for a comprehensive monitor that oversees various first-order processes (like GWT). It extends the idea of the role of a second-order network, which helps us understand phenomena like blindsight.\n",
     "\n",
-    "To analyze how our brains handle and update perceptions, we'll operate within a simplified Bayesian framework. This framework helps us evaluate how we perceive reality based on the information we receive. For example, if you hear rustling leaves, your brain calculates the likelihood of it being caused by the wind versus an animal. This calculation involves updating what we initially guess (our prior belief) with new evidence (observed data), resulting in a new, more informed belief (posterior probability).\n",
+    "To analyze how our brains handle and update perceptions, we'll use a Bayesian framework. This framework helps us evaluate how we perceive reality based on the information we receive. For example, if you hear rustling leaves, your brain calculates the likelihood of it being caused by the wind versus an animal. This calculation involves updating what we initially guess (our prior belief) with new evidence (observed data), resulting in a new, more informed belief (posterior probability).\n",
     "\n",
-    "The function below calculates these updated beliefs and uses Kullback-Leibler (KL) divergence to quantify how much the new information changes our understanding. The KL divergence is a way of measuring the 'distance' between the initial belief and your updated belief. In essence, it's measuring how much you have to change your mind given new evidence.\n",
+    "The function below calculates these updated beliefs and uses *Kullback-Leibler (KL)* divergence to quantify how much the new information changes our understanding. The KL divergence is a way of measuring the 'distance' between the initial belief and your updated belief. In essence, it's measuring how much you have to change your mind given new evidence.\n",
     "\n",
     "We base our analysis on a flat, or single-layer, Bayesian network model. This model directly connects our sensory inputs with our perceptual states, simplifying the complex interactions in our brain into a more manageable form. By stripping away the complexities of multi-layered networks, we focus purely on how direct observations impact our consciousness. This simplified approach helps us to better understand the intricate dance between perception and awareness in our neural systems."
    ]
@@ -2760,16 +2784,6 @@
     "    return post_W, KL_W"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "e4cfba4a-b48a-48c5-a554-f03e7096af2e",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "**Make our stimulus space**"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "11ffb999-c213-4400-8f1b-dac5b42ff5e1",
@@ -2777,11 +2791,13 @@
     "execution": {}
    },
    "source": [
-    "The model we are using is grounded in classical \"signal detection theory\", or SDT for short. SDT is in turn a special case of a Bayesian generative model, in which an arbitrary \"evidence\" value is drawn from an unknown distribution, and the task of the observer is to infer which distribution this evidence came from.\n",
+    "### Defining our Stimulus Space\n",
+    "\n",
+    "The model we are using is grounded in classical *Signal Detection Theory* (SDT). SDT is a special case of a Bayesian generative model, in which an arbitrary *evidence* value is drawn from an unknown distribution. The task of the observer is to infer *which distribution* this evidence came from.\n",
     "\n",
-    "In SDT, an observer receives a piece of evidence—this could be any sensory input, like a sound, a light signal, or a statistical data point. The evidence comes from one of several potential distributions. Each distribution represents a different \"state of the world.\" For instance, one distribution might represent the presence of a signal (like a beep), while another might represent just noise. The observer uses Bayesian inference to assess the probability that the received evidence came from one distribution or another. This involves updating their beliefs (probabilities) based on the new evidence. Based on the probabilities calculated through Bayesian inference, the observer decides which distribution most likely produced the evidence.\n",
+    "In SDT, an observer receives a piece of evidence (this could be any sensory input, like a sound, a light signal, or a statistical data point). The evidence comes from one of several potential distributions. Each distribution represents a different *state of the world.* For instance, one distribution might represent the presence of a signal (like a beep), while another might represent just noise. The observer uses Bayesian inference to assess the probability that the received evidence came from one distribution or another. This involves updating their beliefs (probabilities) based on the new evidence. Based on the probabilities calculated through Bayesian inference, the observer decides which distribution most likely produced the evidence.\n",
     "\n",
-    "Let's now imagine we have two categories, A and B - for instance, left- and right-tilted visual stimuli. The sensory \"evidence\" can be written as 2D vector, where the first element is evidence for A, and the second element evidence for B:"
+    "Let's now imagine we have two categories, A and B - for instance, left- and right-tilted visual stimuli. The sensory *evidence* can be written as 2D vector, where the first element is evidence for A, and the second element evidence for B:"
    ]
   },
   {
@@ -2926,7 +2942,7 @@
     "execution": {}
    },
    "source": [
-    "The model partitions the stimuli in the expected way. KL divergence is higher further away from the boundaries, as measuring stimuli far away from the boundaries makes the model rapidly update its beliefs. This is because the model is more certain about the stimuli's class when they are far from the boundaries."
+    "The model partitions the stimuli in the expected way. KL divergence is higher further away from the boundaries, as measuring stimuli far away from the boundaries makes the model rapidly update its beliefs. This is because the model is more *certain* about the stimuli's class when they are far from the boundaries."
    ]
   },
   {
@@ -2936,11 +2952,11 @@
     "execution": {}
    },
    "source": [
-    "**Add in higher-order node for global detection**\n",
+    "#### Add in higher-order node for global detection\n",
     "\n",
-    "So far, our model has been straightforward, or \"flat,\" where each perceptual state (like leftward tilt, rightward tilt, or no stimulus) is treated separately. However, real-life perception often requires more complex judgments about the presence or absence of any stimulus, not just identifying specific types. This is where a higher-order node comes into play.\n",
+    "So far, our model has been straightforward, or *flat*, where each perceptual state (like leftward tilt, rightward tilt, or no stimulus) is treated separately. However, real-life perception often requires more complex judgments about the presence or absence of any stimulus, not just identifying specific types. This is where a higher-order node comes into play.\n",
     "\n",
-    "**Introducing the \"A\" Level:**\n",
+    "#### Introducing the \"A\" Level:\n",
     "\n",
     "Think of the \"A\" level as a kind of overseer or monitor that watches over the lower-level states ($w_1$, $w_2$, etc.). This higher-order node isn't concerned with the specific content of the stimulus (like which direction something is tilting) but rather with whether there's any significant stimulus at all versus just noise. It takes inputs from the same data (pairs of $X$'s), but it adds a layer of awareness. It evaluates whether the data points suggest any meaningful content or if they're likely just random noise.\n",
     "\n",
@@ -3202,8 +3218,11 @@
     "execution": {}
    },
    "source": [
-    "**Simulate ignition (asymmetry vs. symmetry)**\n",
+    "We have included some further details on the notion of ignition. Please feel free to toggle the switch below to learn more. If you're running low on time, then please feel free to run the cell below and come back to this section. The outro video will also cover the broad overview of this concept.\n",
     "\n",
+    "<details>\n",
+    "    <summary>Simulate Ignition (assymetry vs symmetry)</summary>\n",
+    "    \n",
     "The HOSS architecture is designed to detect whether something is there or not. When it detects something, it ends up making more prediction errors in its predictions compared to when it detects nothing. These prediction errors are tracked using a method called Kullback-Leibler (KL) divergence, particularly at a certain level within the model known as the W level.\n",
     "\n",
     "This increase in prediction errors when something is detected is similar to what happens in the human brain, a phenomenon known as global ignition responses. These are big surges in brain activity that happen when we become conscious of something. Research like that conducted by Del Cul et al. (2007) and Dehaene and Changeux (2011) support this concept, linking it to the global workspace model. This model describes consciousness as the sharing of information across different parts of the brain.\n",
@@ -3212,13 +3231,15 @@
     "\n",
     "We then classify these prediction errors based on whether the model recognizes a stimulus as \"seen\" or \"unseen.\" If the model has a response indicating \"seen,\" it shows more activity than when it indicates \"unseen.\" This is what we refer to as ignition — more activity for \"seen\" stimuli.\n",
     "\n",
-    "However, it's crucial to understand that in the HOSS model, these ignition-like responses don't directly cause the global sharing of information in the network. Rather, they are secondary effects or byproducts of other calculations happening within the network. Essentially, these bursts of activity are outcomes of deeper processes in the network, not the direct mechanisms for distributing information throughout the system."
+    "However, it's crucial to understand that in the HOSS model, these ignition-like responses don't directly cause the global sharing of information in the network. Rather, they are secondary effects or byproducts of other calculations happening within the network. Essentially, these bursts of activity are outcomes of deeper processes in the network, not the direct mechanisms for distributing information throughout the system.\n",
+    "\n",
+    "</details>"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "b09f812a-f202-4f3d-ac66-247b322002e7",
+   "id": "af24eab7-5e81-4f04-bdb8-f192058d06b3",
    "metadata": {
     "execution": {}
    },
@@ -3246,13 +3267,7 @@
     "sem_KL_A_no = []\n",
     "all_prob_y = []\n",
     "\n",
-    "##############################################################################\n",
-    "## TODO for students: Fill in the missing parts (...)\n",
-    "## Fill in the missing parts to complete the function and remove\n",
-    "raise NotImplementedError(\"Student exercise\")\n",
-    "##############################################################################\n",
-    "\n",
-    "for y in tqdm(..., desc='Processing gammas'):\n",
+    "for y in tqdm(gamma, desc='Processing gammas'):\n",
     "    Sigma = np.diag([1./np.sqrt(y)]*2)\n",
     "    mean_KL_w = np.zeros((Nsubjects, 4))\n",
     "    mean_KL_A = np.zeros((Nsubjects, 4))\n",
@@ -3332,22 +3347,6 @@
     "    plt.show()"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "af24eab7-5e81-4f04-bdb8-f192058d06b3",
-   "metadata": {
-    "colab_type": "text",
-    "execution": {}
-   },
-   "source": [
-    "[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial1_Solution_96fe639d.py)\n",
-    "\n",
-    "*Example output:*\n",
-    "\n",
-    "<img alt='Solution hint' align='left' width=977.0 height=477.0 src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/static/W2D5_Tutorial1_Solution_96fe639d_585.png>\n",
-    "\n"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -3487,9 +3486,9 @@
    },
    "source": [
     "---\n",
-    "# Summary\n",
+    "# The Big Picture\n",
     "\n",
-    "Hakwan will now discuss the critical aspects and limitations of current consciousness studies, addressing the challenges in distinguishing theories of consciousness from those merely describing general brain functions."
+    "Hakwan will now discuss the critical aspects and limitations of current consciousness studies, addressing the challenges in distinguishing theories of consciousness from those merely describing general brain functions. Join us in the next two videos where we wrap up some of the big ideas and try to put them in context for you!"
    ]
   },
   {
@@ -3647,723 +3646,9 @@
     "execution": {}
    },
    "source": [
-    "Below you'll find some optional coding & discussion bonus content!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f862cbc2-3222-484c-98cb-993f2b591b37",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "---\n",
-    "# Coding Bonus Section\n",
-    "This secton contains some extra coding exercises in case you have time and inclination."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "76dd7488-6558-4022-8541-22765f2967c6",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "## Bonus coding exersice 1: Train a first-order network\n",
-    "\n",
-    "This section invites you to engage with a straightforward, auto-generated dataset on blindsight, originally introduced by [Pasquali et al. in 2010](https://www.sciencedirect.com/science/article/abs/pii/S0010027710001794). Blindsight is a fascinating condition where individuals who are cortically blind due to damage in their primary visual cortex can still respond to visual stimuli without conscious perception. This intriguing phenomenon underscores the intricate nature of sensory processing and the brain's ability to process information without conscious awareness."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8c79b0a2-8e12-44ea-a685-bba788f6685d",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Visualize the autogenerated data\n",
-    "factor=2\n",
-    "initialize_global()\n",
-    "set_pre, _ = create_patterns(0,factor)\n",
-    "plot_signal_max_and_indicator(set_pre.detach().cpu(), \"Example - Pre training dataset\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cff70408-8662-43f5-b930-fc2a6ffca323",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "The pre-training dataset for the network consisted of 200 patterns. These were evenly divided: half were purely noise (with unit activations randomly chosen between 0.0 and 0.02), and the other half represented potential stimuli. In the stimulus patterns, 99 out of 100 units had activations ranging between 0.0 and 0.02, with one unique unit having an activation between 0.0 and 1.0."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0f45662a-08b4-44a4-89e1-200fc0c9cddb",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "**Testing patterns**\n",
-    "\n",
-    "As we have seen before, the network underwent evaluations under three distinct conditions, each modifying the signal-to-noise ratio in a unique way to explore different degrees and types of blindness.\n",
-    "\n",
-    "Suprathreshold stimulus condition: here, the network was exposed to the identical set of 200 patterns used during pre-training, testing the network's response to familiar inputs.\n",
-    "\n",
-    "Subthreshold stimulus condition (blindsight simulation): this condition aimed to mimic blindsight. It was achieved by introducing a slight noise increment (+0.0012) to every input of the first-order network, barring the one designated as the stimulus. This setup tested the network's ability to discern faint signals amidst noise.\n",
-    "\n",
-    "Low vision condition: to simulate low vision, the activation levels of the stimuli were reduced. Unlike the range from 0.0 to 1.0 used in pre-training, the stimuli's activation levels were adjusted to span from 0.0 to 0.3. This condition examined the network's capability to recognize stimuli with diminished intensity."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "db58d78b-17d8-4651-801a-f06e568a7322",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "factor=2\n",
-    "# Compare your results with the patterns generate below\n",
-    "set_1, _ = create_patterns(0,factor)\n",
-    "set_2, _ = create_patterns(1,factor)\n",
-    "set_3, _ = create_patterns(2,factor)\n",
-    "\n",
-    "# Plot\n",
-    "plot_signal_max_and_indicator(set_1.detach().cpu(), \"Suprathreshold dataset\")\n",
-    "plot_signal_max_and_indicator(set_2.detach().cpu(), \"Subthreshold dataset\")\n",
-    "plot_signal_max_and_indicator(set_3.detach().cpu(), \"Low Vision dataset\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cd5c13e0-75e8-45c2-b1be-70496041364b",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "### Activity 1: Building a network for a blindsight situation\n",
-    "\n",
-    "In this activity, we'll construct a neural network model using our auto-generated dataset, focusing on blindsight scenarios. The model will primarily consist of fully connected layers, establishing a straightforward, first-order network. The aim here is to assess the basic network's performance.\n",
-    "\n",
-    "**Steps to follow**\n",
-    "\n",
-    "1. Examine the network architecture: understand the structure of the neural network you're about to work with.\n",
-    "2. Visualize loss metrics: observe and analyze the network's performance during pre-training by visualizing the loss over epochs.\n",
-    "3. Evaluate the model: use the provided code snippets to calculate and interpret the model's accuracy, recall, and F1-score, giving you insight into the network's capabilities.\n",
-    "\n",
-    "**Understanding the process**\n",
-    "\n",
-    "The goal is to gain a thorough comprehension of the network's architecture and to interpret the pre-training results visually. This will provide a clearer picture of the model's potential and limitations.\n",
-    "\n",
-    "The network is designed as a backpropagation autoassociator. It features a 100-unit input layer, directly linked to a 40-unit hidden layer, which in turn connects to a 100-unit output layer. Initial connection weights are set within the range of -1.0 to 1.0 for the first-order network. To mitigate overfitting, dropout is employed within the network architecture. The architecture includes a configurable activation function. This flexibility allows for adjustments and tuning in Activity 3, aiming for optimal model performance."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "94d0bcaf-8b49-4e35-b0d2-1b9dcc98b182",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "class FirstOrderNetwork(nn.Module):\n",
-    "    def __init__(self, hidden_units, data_factor, use_gelu):\n",
-    "        \"\"\"\n",
-    "        Initializes the FirstOrderNetwork with specific configurations.\n",
-    "\n",
-    "        Parameters:\n",
-    "        - hidden_units (int): The number of units in the hidden layer.\n",
-    "        - data_factor (int): Factor to scale the amount of data processed.\n",
-    "                             A factor of 1 indicates the default data amount,\n",
-    "                             while 10 indicates 10 times the default amount.\n",
-    "        - use_gelu (bool): Flag to use GELU (True) or ReLU (False) as the activation function.\n",
-    "        \"\"\"\n",
-    "        super(FirstOrderNetwork, self).__init__()\n",
-    "\n",
-    "        # Define the encoder, hidden, and decoder layers with specified units\n",
-    "\n",
-    "        self.fc1 = nn.Linear(100, hidden_units, bias = False) # Encoder\n",
-    "        self.hidden= nn.Linear(hidden_units, hidden_units, bias = False) # Hidden\n",
-    "        self.fc2 = nn.Linear(hidden_units, 100, bias = False) # Decoder\n",
-    "\n",
-    "        self.relu = nn.ReLU()\n",
-    "        self.sigmoid = nn.Sigmoid()\n",
-    "\n",
-    "\n",
-    "        # Dropout layer to prevent overfitting\n",
-    "        self.dropout = nn.Dropout(0.1)\n",
-    "\n",
-    "        # Set the data factor\n",
-    "        self.data_factor = data_factor\n",
-    "\n",
-    "        # Other activation functions for various purposes\n",
-    "        self.softmax = nn.Softmax()\n",
-    "\n",
-    "        # Initialize network weights\n",
-    "        self.initialize_weights()\n",
-    "\n",
-    "    def initialize_weights(self):\n",
-    "        \"\"\"Initializes weights of the encoder, hidden, and decoder layers uniformly.\"\"\"\n",
-    "        init.uniform_(self.fc1.weight, -1.0, 1.0)\n",
-    "\n",
-    "        init.uniform_(self.fc2.weight, -1.0, 1.0)\n",
-    "        init.uniform_(self.hidden.weight, -1.0, 1.0)\n",
-    "\n",
-    "    def encoder(self, x):\n",
-    "      h1 = self.dropout(self.relu(self.fc1(x.view(-1, 100))))\n",
-    "      return h1\n",
-    "\n",
-    "    def decoder(self,z):\n",
-    "      #h2 = self.relu(self.hidden(z))\n",
-    "      h2 = self.sigmoid(self.fc2(z))\n",
-    "      return h2\n",
-    "\n",
-    "\n",
-    "    def forward(self, x):\n",
-    "      \"\"\"\n",
-    "      Defines the forward pass through the network.\n",
-    "\n",
-    "      Parameters:\n",
-    "      - x (Tensor): The input tensor to the network.\n",
-    "\n",
-    "      Returns:\n",
-    "      - Tensor: The output of the network after passing through the layers and activations.\n",
-    "      \"\"\"\n",
-    "      h1 = self.encoder(x)\n",
-    "      h2 = self.decoder(h1)\n",
-    "\n",
-    "      return h1 , h2"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "83e07f1a-540b-4bfa-8e9b-f114319f1f96",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "For now, we will train the first order network only."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7bfade3d-6385-459c-8f07-e3017264455a",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
-    "# Hyperparameters\n",
-    "\n",
-    "# Hyperparameters\n",
-    "global optimizer ,n_epochs , learning_rate_1\n",
-    "learning_rate_1 = 0.5\n",
-    "n_epochs = 100\n",
-    "optimizer=\"ADAMAX\"\n",
-    "hidden=40\n",
-    "factor=2\n",
-    "gelu=False\n",
-    "gam=0.98\n",
-    "meta=True\n",
-    "stepsize=25\n",
-    "initialize_global()\n",
-    "\n",
-    "\n",
-    "# Networks instantiation\n",
-    "first_order_network = FirstOrderNetwork(hidden,factor,gelu).to(device)\n",
-    "second_order_network = SecondOrderNetwork(gelu).to(device) # We define it, but won't use it until activity 3\n",
-    "\n",
-    "# Loss function\n",
-    "criterion_1 = CAE_loss\n",
-    "\n",
-    "# Optimizer\n",
-    "optimizer_1 = optim.Adamax(first_order_network.parameters(), lr=learning_rate_1)\n",
-    "\n",
-    "# Learning rate schedulers\n",
-    "scheduler_1 = StepLR(optimizer_1, step_size=stepsize, gamma=gam)\n",
-    "\n",
-    "max_values_output_first_order = []\n",
-    "max_indices_output_first_order = []\n",
-    "max_values_patterns_tensor = []\n",
-    "max_indices_patterns_tensor = []\n",
-    "\n",
-    "# Training loop\n",
-    "for epoch in range(n_epochs):\n",
-    "    # Generate training patterns and targets for each epoch.\n",
-    "    patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor = generate_patterns(patterns_number, num_units,factor, 0)\n",
-    "\n",
-    "    # Forward pass through the first-order network\n",
-    "    hidden_representation , output_first_order = first_order_network(patterns_tensor)\n",
-    "\n",
-    "    output_first_order=output_first_order.requires_grad_(True)\n",
-    "\n",
-    "    # Skip computations for the second-order network\n",
-    "    with torch.no_grad():\n",
-    "\n",
-    "        # Potentially forward pass through the second-order network without tracking gradients\n",
-    "        output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
-    "\n",
-    "    # Calculate the loss for the first-order network (accuracy of stimulus representation)\n",
-    "    W = first_order_network.state_dict()['fc1.weight']\n",
-    "    loss_1 = criterion_1( W, stim_present_tensor.view(-1, 100), output_first_order,\n",
-    "                        hidden_representation, lam )\n",
-    "    # Backpropagate the first-order network's loss\n",
-    "    loss_1.backward()\n",
-    "\n",
-    "    # Update first-order network weights\n",
-    "    optimizer_1.step()\n",
-    "\n",
-    "    # Reset first-order optimizer gradients to zero for the next iteration\n",
-    "\n",
-    "    # Update the first-order scheduler\n",
-    "    scheduler_1.step()\n",
-    "\n",
-    "    epoch_1_order[epoch] = loss_1.item()\n",
-    "\n",
-    "    # Get max values and indices for output_first_order\n",
-    "    max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
-    "    max_inds_out[max_vals_out == 0] = 0\n",
-    "    max_values_output_first_order.append(max_vals_out.tolist())\n",
-    "    max_indices_output_first_order.append(max_inds_out.tolist())\n",
-    "\n",
-    "    # Get max values and indices for patterns_tensor\n",
-    "    max_vals_pat, max_inds_pat = torch.max(patterns_tensor[100:], dim=1)\n",
-    "    max_inds_pat[max_vals_pat == 0] = 0\n",
-    "    max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
-    "    max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
-    "\n",
-    "\n",
-    "max_values_indices = (max_values_output_first_order[-1],\n",
-    "            max_indices_output_first_order[-1],\n",
-    "            max_values_patterns_tensor[-1],\n",
-    "            max_indices_patterns_tensor[-1])\n",
+    "Tutorial 3 today contains some bonus material based on extensions of what we've covered today. There, you'll find some optional coding & discussion bonus content! Feel free to bookmark and come back to it whenever you are ready. \n",
     "\n",
-    "\n",
-    "# Plot training loss curve\n",
-    "pre_train_plots(epoch_1_order, epoch_2_order, \"1st & 2nd Order Networks\" , max_value_indices )"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fa7d1ce9-da1b-4f78-b388-47bfcd50c6dd",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "### Testing under 3 blindsight conditions\n",
-    "\n",
-    "We will now use the testing auto-generated datasets from activity 1 to test the network's performance."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2affe162-f4d9-495f-862a-65b0f50ca5ef",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "results_seed=[]\n",
-    "discrimination_seed=[]\n",
-    "\n",
-    "# Prepare networks for testing by calling the configuration function\n",
-    "testing_patterns, n_samples, loaded_model, loaded_model_2 = config_training(first_order_network, second_order_network, hidden, factor, gelu)\n",
-    "\n",
-    "# Perform testing using the defined function and plot the results\n",
-    "f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, loaded_model, loaded_model_2,factor)\n",
-    "\n",
-    "results_seed.append(results_for_plotting)\n",
-    "discrimination_seed.append(discrimination_performances)\n",
-    "# Assuming plot_testing is defined, call it to display results\n",
-    "plot_testing(results_seed,discrimination_seed,  1, \"Seed\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0d18302e-4657-4732-b6ef-f7439d2bb2fd",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_First_order_network\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8a694f1e-3f32-48fc-bce8-0b544d43ca62",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "---\n",
-    "## Bonus coding section 2: Plot surfaces for content / awareness inferences\n",
-    "\n",
-    "To explore the properties of the HOSS model, we can simulate inference at different levels of the hierarchy over the full 2D space of possible input X's. The left panel below shows that the probability of awareness (of any stimulus contents) rises in a graded manner from the lower left corner of the graph (low activation of any feature) to the upper right (high activation of both features). In contrast, the right panel shows that confidence in making a discrimination response (e.g. rightward vs. leftward) increases away from the major diagonal, as the model becomes sure that the sample was generated by either a leftward or rightward tilted stimulus.\n",
-    "\n",
-    "Together, the two surfaces make predictions about the relationships we might see between discrimination confidence and awareness in a simple psychophysics experiment. One notable prediction is that discrimination could still be possible - and lead to some degree of confidence - even when the higher-order node is \"reporting\" unawareness of the stimulus.\n",
-    "\n",
-    "Now, let's get hands on and plot those auto-generated patterns!\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "31503073-a7c0-4502-8d94-5ffa47a22926",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Define the grid\n",
-    "xgrid = np.arange(0, 2.01, 0.01)\n",
-    "\n",
-    "# Define the means for the Gaussian distributions\n",
-    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
-    "\n",
-    "# Define the covariance matrix\n",
-    "Sigma = np.array([[1, 0], [0, 1]])\n",
-    "\n",
-    "# Prior probabilities\n",
-    "Wprior = np.array([0.5, 0.5])\n",
-    "Aprior = 0.5\n",
-    "\n",
-    "# Initialize arrays to hold confidence and posterior probability\n",
-    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
-    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
-    "\n",
-    "# Compute confidence and posterior probability for each point in the grid\n",
-    "for i, xi in tqdm(enumerate(xgrid), total=len(xgrid), desc='Outer Loop'):\n",
-    "    for j, xj in enumerate(xgrid):\n",
-    "        X = [xi, xj]\n",
-    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
-    "        confW[i, j] = max(post_w[1], post_w[2])\n",
-    "        posteriorAware[i, j] = post_A[1]\n",
-    "\n",
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Plotting\n",
-    "    plt.figure(figsize=(10, 5))\n",
-    "\n",
-    "    # Posterior probability \"seen\"\n",
-    "    plt.subplot(1, 2, 1)\n",
-    "    plt.contourf(xgrid, xgrid, posteriorAware.T)\n",
-    "    plt.colorbar()\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('Posterior probability \"seen\"')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    # Confidence in identity\n",
-    "    plt.subplot(1, 2, 2)\n",
-    "    contour_set = plt.contourf(xgrid, xgrid, confW.T)\n",
-    "    plt.colorbar()\n",
-    "    plt.contour(xgrid, xgrid, posteriorAware.T, levels=[0.5], linewidths=4, colors=['white'])  # Line contour for threshold\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('Confidence in identity')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2d129657-62aa-42d1-970a-93fd67736b69",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "**Simulate KL-divergence surfaces**\n",
-    "\n",
-    "We can also simulate KL-divergences (a measure of Bayesian surprise) at each layer in the network, which under predictive coding models of the brain, has been proposed to scale with neural activation (e.g., Friston, 2005; Summerfield & de Lange, 2014)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "66044263-c8de-49a9-a56b-2e7336cc737c",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# Define the grid\n",
-    "xgrid = np.arange(0, 2.01, 0.01)\n",
-    "\n",
-    "# Define the means for the Gaussian distributions\n",
-    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
-    "\n",
-    "# Define the covariance matrix\n",
-    "Sigma = np.array([[1, 0], [0, 1]])\n",
-    "\n",
-    "# Prior probabilities\n",
-    "Wprior = np.array([0.5, 0.5])\n",
-    "Aprior = 0.5\n",
-    "\n",
-    "# Initialize arrays to hold confidence and posterior probability\n",
-    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
-    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
-    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
-    "\n",
-    "# Compute confidence and posterior probability for each point in the grid\n",
-    "for i, xi in enumerate(xgrid):\n",
-    "    for j, xj in enumerate(xgrid):\n",
-    "        X = [xi, xj]\n",
-    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
-    "\n",
-    "        confW[i, j] = max(post_w[1], post_w[2])\n",
-    "        posteriorAware[i, j] = post_A[1]\n",
-    "\n",
-    "# Calculate the mean K-L divergence for absent and present awareness states\n",
-    "KL_A_absent = np.mean(KL_A[posteriorAware < 0.5])\n",
-    "KL_A_present = np.mean(KL_A[posteriorAware >= 0.5])\n",
-    "KL_w_absent = np.mean(KL_w[posteriorAware < 0.5])\n",
-    "KL_w_present = np.mean(KL_w[posteriorAware >= 0.5])\n",
-    "\n",
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Plotting\n",
-    "    plt.figure(figsize=(18, 6))\n",
-    "\n",
-    "    # K-L divergence, perceptual states\n",
-    "    plt.subplot(1, 2, 1)\n",
-    "    plt.contourf(xgrid, xgrid, KL_w.T, cmap='viridis')\n",
-    "    plt.colorbar()\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('KL-divergence, perceptual states')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    # K-L divergence, awareness state\n",
-    "    plt.subplot(1, 2, 2)\n",
-    "    plt.contourf(xgrid, xgrid, KL_A.T, cmap='viridis')\n",
-    "    plt.colorbar()\n",
-    "    plt.xlabel('X1')\n",
-    "    plt.ylabel('X2')\n",
-    "    plt.title('KL-divergence, awareness state')\n",
-    "    plt.axis('square')\n",
-    "\n",
-    "    plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b32e4908-0f6f-4259-832f-045adcb19700",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "### Discussion point\n",
-    "\n",
-    "Can you recognise the difference between the KL divergence for the W-level and the one for the A-level?"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7d8deb66-9a1d-49e1-a3ef-96970efa8d97",
-   "metadata": {
-    "colab_type": "text",
-    "execution": {}
-   },
-   "source": [
-    "[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial1_Solution_f903bbb4.py)\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "869fc8f1-4199-4525-80b3-26e74babc66a",
-   "metadata": {
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "with plt.xkcd():\n",
-    "\n",
-    "    # Create figure with specified size\n",
-    "    plt.figure(figsize=(10, 5))\n",
-    "\n",
-    "    # KL divergence for W states\n",
-    "    plt.subplot(1, 2, 1)\n",
-    "    plt.bar(['unseen', 'seen'], [KL_w_absent, KL_w_present], color='k')\n",
-    "    plt.ylabel('KL divergence, W states')\n",
-    "    plt.xticks(fontsize=18)\n",
-    "    plt.yticks(fontsize=18)\n",
-    "\n",
-    "    # KL divergence for A states\n",
-    "    plt.subplot(1, 2, 2)\n",
-    "    plt.bar(['unseen', 'seen'], [KL_A_absent, KL_A_present], color='k')\n",
-    "    plt.ylabel('KL divergence, A states')\n",
-    "    plt.xticks(fontsize=18)\n",
-    "    plt.yticks(fontsize=18)\n",
-    "\n",
-    "    plt.tight_layout()\n",
-    "\n",
-    "    # Show plot\n",
-    "    plt.show()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "64ecb92c-bfe3-4e49-bd40-f11ffa685ece",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_HOSS_Bonus_Content\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bcd87344-d473-44af-a881-b68e5471d353",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "---\n",
-    "# Discussion Bonus Section\n",
-    "This section contains an extra discussion exercise if you have time and inclination."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ca33829c-8d54-437e-ba33-d3003af51d7a",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "In this bonus section, Megan and Anil will delve into the complexities of defining and testing for consciousness, particularly in the context of artificial intelligence. We will explore various theoretical perspectives, examine classic and contemporary tests for consciousness, and discuss the challenges and ethical implications of determining whether a system truly possesses conscious experience."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "21b621ea-1639-4131-8ec3-9cdf34a64f77",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Video 11: Consciousness Bonus Content\n",
-    "\n",
-    "from ipywidgets import widgets\n",
-    "from IPython.display import YouTubeVideo\n",
-    "from IPython.display import IFrame\n",
-    "from IPython.display import display\n",
-    "\n",
-    "class PlayVideo(IFrame):\n",
-    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
-    "    self.id = id\n",
-    "    if source == 'Bilibili':\n",
-    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
-    "    elif source == 'Osf':\n",
-    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
-    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
-    "\n",
-    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
-    "  tab_contents = []\n",
-    "  for i, video_id in enumerate(video_ids):\n",
-    "    out = widgets.Output()\n",
-    "    with out:\n",
-    "      if video_ids[i][0] == 'Youtube':\n",
-    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
-    "                             height=H, fs=fs, rel=0)\n",
-    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
-    "      else:\n",
-    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
-    "                          height=H, fs=fs, autoplay=False)\n",
-    "        if video_ids[i][0] == 'Bilibili':\n",
-    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
-    "        elif video_ids[i][0] == 'Osf':\n",
-    "          print(f'Video available at https://osf.io/{video.id}')\n",
-    "      display(video)\n",
-    "    tab_contents.append(out)\n",
-    "  return tab_contents\n",
-    "\n",
-    "video_ids = [('Youtube', '00dL8q7WgcU'), ('Bilibili', 'BV12n4y1Q7C2')]\n",
-    "tab_contents = display_videos(video_ids, W=854, H=480)\n",
-    "tabs = widgets.Tab()\n",
-    "tabs.children = tab_contents\n",
-    "for i in range(len(tab_contents)):\n",
-    "  tabs.set_title(i, video_ids[i][0])\n",
-    "display(tabs)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "39c202fb-f580-4a96-8f8e-bad24ed1d55c",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_Video_11\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e9e839f2-e237-4ed4-9045-56dc7b5f6d60",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "## Discussion activity: Is it actually conscious?"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2720c0b5-6386-43a6-9647-f1245531c376",
-   "metadata": {
-    "execution": {}
-   },
-   "source": [
-    "We discussed the difference between these two...\n",
-    "- \"Forward\" tests: passing means the machine is conscious (or intelligent).\n",
-    "- \"Reverse\" tests: passing means humans are convinced that a machine is conscious (or intelligent).\n",
-    "\n",
-    "**Discuss!** If a system (AI, other animal, other human) exhibited all the \"right signs\" of being conscious, how can we know for sure it is actually conscious? How could you design a test to be a true forward test?\n",
-    "\n",
-    "- Room 1: I think you could design a forward test in this way... [share your ideas]\n",
-    "- Room 2: I think a forward test is impossible, and here's why [share your ideas]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "84958157-c165-4cc3-be76-408999cf44ad",
-   "metadata": {
-    "cellView": "form",
-    "execution": {}
-   },
-   "outputs": [],
-   "source": [
-    "# @title Submit your feedback\n",
-    "content_review(f\"{feedback_prefix}_Discussion_activity\")"
+    "For the moment, let's switch to our second topic on the day, arguably one of the most important topics we've covered so far: Ethics."
    ]
   }
  ],
@@ -4395,7 +3680,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.22"
   }
  },
  "nbformat": 4,
diff --git a/tutorials/W2D5_Mysteries/student/W2D5_Tutorial2.ipynb b/tutorials/W2D5_Mysteries/student/W2D5_Tutorial2.ipynb
index ad486f06a..099132da9 100644
--- a/tutorials/W2D5_Mysteries/student/W2D5_Tutorial2.ipynb
+++ b/tutorials/W2D5_Mysteries/student/W2D5_Tutorial2.ipynb
@@ -25,9 +25,9 @@
     "\n",
     "__Content creators:__ Megan Peters, Joshua Shepherd, Jana Schaich Borg\n",
     "\n",
-    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang\n",
+    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Alex Murphy\n",
     "\n",
-    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk\n"
+    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Alex Murphy\n"
    ]
   },
   {
@@ -542,7 +542,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.19"
+   "version": "3.9.22"
   }
  },
  "nbformat": 4,
diff --git a/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb b/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb
new file mode 100644
index 000000000..023fc8554
--- /dev/null
+++ b/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb
@@ -0,0 +1,2342 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "89a00b06-154b-4aaf-8bee-b96a675406b5",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a> &nbsp; <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D5_Mysteries/student/W2D5_Tutorial3.ipynb\"  target=\"_parent\"><img src=\"https://kaggle.com/static/images/open-in-kaggle.svg\" alt=\"Open in Kaggle\"/></a>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82ed61a3-87d2-4e76-83f6-4b786c101af2",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "# (Bonus) Tutorial 3: Consciousness (Extended)\n",
+    "\n",
+    "**Week 2, Day 5: Mysteries**\n",
+    "\n",
+    "**By Neuromatch Academy**\n",
+    "\n",
+    "__Content creators:__ Steve Fleming, Guillaume Dumas, Samuele Bolotta, Juan David Vargas, Hakwan Lau, Anil Seth, Megan Peters\n",
+    "\n",
+    "__Content reviewers:__ Samuele Bolotta, Lily Chamakura, RyeongKyung Yoon, Yizhou Chen, Ruiyi Zhang, Patrick Mineault, Alex Murphy\n",
+    "\n",
+    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Patrick Mineault, Alex Murphy\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7861818a",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Install and import feedback gadget\n",
+    "\n",
+    "!pip install vibecheck numpy matplotlib Pillow torch torchvision transformers ipywidgets gradio trdg scikit-learn networkx pickleshare seaborn tabulate --quiet\n",
+    "\n",
+    "from vibecheck import DatatopsContentReviewContainer\n",
+    "def content_review(notebook_section: str):\n",
+    "    return DatatopsContentReviewContainer(\n",
+    "        \"\",  # No text prompt\n",
+    "        notebook_section,\n",
+    "        {\n",
+    "            \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
+    "            \"name\": \"neuromatch_neuroai\",\n",
+    "            \"user_key\": \"wb2cxze8\",\n",
+    "        },\n",
+    "    ).render()\n",
+    "\n",
+    "feedback_prefix = \"W2D5_T3\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c4e3a7d",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Import dependencies\n",
+    "# @markdown\n",
+    "\n",
+    "import contextlib\n",
+    "import io\n",
+    "\n",
+    "with contextlib.redirect_stdout(io.StringIO()):\n",
+    "    # Standard Libraries\n",
+    "    import copy\n",
+    "    import logging\n",
+    "    import os\n",
+    "    import random\n",
+    "    import requests\n",
+    "\n",
+    "    # Data Handling and Visualization Libraries\n",
+    "    import numpy as np\n",
+    "    import pandas as pd\n",
+    "    import matplotlib.pyplot as plt\n",
+    "    import seaborn as sns\n",
+    "    from sklearn.metrics import precision_score, recall_score, fbeta_score\n",
+    "    from sklearn.linear_model import LinearRegression\n",
+    "    from tabulate import tabulate\n",
+    "\n",
+    "    # Scientific Computing and Statistical Libraries\n",
+    "    from numpy.linalg import inv\n",
+    "    from scipy.special import logsumexp\n",
+    "    from scipy.stats import multivariate_normal\n",
+    "\n",
+    "    # Deep Learning Libraries\n",
+    "    import torch\n",
+    "    from torch import nn, optim, save, load\n",
+    "    from torch.nn import functional as F\n",
+    "    from torch.utils.data import DataLoader\n",
+    "    import torch.nn.init as init\n",
+    "    from torch.optim.lr_scheduler import StepLR\n",
+    "\n",
+    "    # Image Processing Libraries\n",
+    "    from PIL import Image\n",
+    "    from matplotlib.patches import Patch\n",
+    "    from mpl_toolkits.mplot3d import Axes3D\n",
+    "\n",
+    "    # Interactive Elements and Web Applications\n",
+    "    from IPython.display import IFrame\n",
+    "    from IPython.display import Image as IMG\n",
+    "    import gradio as gr\n",
+    "    import ipywidgets as widgets\n",
+    "    from ipywidgets import interact, IntSlider\n",
+    "\n",
+    "    # Graph Analysis Libraries\n",
+    "    import networkx as nx\n",
+    "\n",
+    "    # Progress Monitoring Libraries\n",
+    "    from tqdm import tqdm\n",
+    "\n",
+    "    # Utilities and Miscellaneous Libraries\n",
+    "    from itertools import product\n",
+    "\n",
+    "    import math\n",
+    "    !pip install torch_optimizer\n",
+    "    import torch_optimizer as optim2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "00f889a6",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Figure settings\n",
+    "# @markdown\n",
+    "\n",
+    "logging.getLogger('matplotlib.font_manager').disabled = True\n",
+    "\n",
+    "%matplotlib inline\n",
+    "%config InlineBackend.figure_format = 'retina' # perfrom high definition rendering for images and plots\n",
+    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "98ca7c55",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Set device (GPU or CPU)\n",
+    "\n",
+    "def set_device():\n",
+    "    \"\"\"\n",
+    "    Determines and sets the computational device for PyTorch operations based on the availability of a CUDA-capable GPU.\n",
+    "\n",
+    "    Outputs:\n",
+    "    - device (str): The device that PyTorch will use for computations ('cuda' or 'cpu'). This string can be directly used\n",
+    "    in PyTorch operations to specify the device.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
+    "    if device != \"cuda\":\n",
+    "        print(\"GPU is not enabled in this notebook. \\n\"\n",
+    "              \"If you want to enable it, in the menu under `Runtime` -> \\n\"\n",
+    "              \"`Hardware accelerator.` and select `GPU` from the dropdown menu\")\n",
+    "    else:\n",
+    "        print(\"GPU is enabled in this notebook. \\n\"\n",
+    "              \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n",
+    "              \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n",
+    "\n",
+    "    return device\n",
+    "\n",
+    "device = set_device()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2508d8b4",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Helper functions\n",
+    "\n",
+    "mse_loss = nn.BCELoss(size_average = False)\n",
+    "\n",
+    "lam = 1e-4\n",
+    "\n",
+    "from torch.autograd import Variable\n",
+    "\n",
+    "def CAE_loss(W, x, recons_x, h, lam):\n",
+    "    \"\"\"Compute the Contractive AutoEncoder Loss\n",
+    "\n",
+    "    Evalutes the CAE loss, which is composed as the summation of a Mean\n",
+    "    Squared Error and the weighted l2-norm of the Jacobian of the hidden\n",
+    "    units with respect to the inputs.\n",
+    "\n",
+    "\n",
+    "    See reference below for an in-depth discussion:\n",
+    "      #1: http://wiseodd.github.io/techblog/2016/12/05/contractive-autoencoder\n",
+    "\n",
+    "    Args:\n",
+    "        `W` (FloatTensor): (N_hidden x N), where N_hidden and N are the\n",
+    "          dimensions of the hidden units and input respectively.\n",
+    "        `x` (Variable): the input to the network, with dims (N_batch x N)\n",
+    "        recons_x (Variable): the reconstruction of the input, with dims\n",
+    "          N_batch x N.\n",
+    "        `h` (Variable): the hidden units of the network, with dims\n",
+    "          batch_size x N_hidden\n",
+    "        `lam` (float): the weight given to the jacobian regulariser term\n",
+    "\n",
+    "    Returns:\n",
+    "        Variable: the (scalar) CAE loss\n",
+    "    \"\"\"\n",
+    "    mse = mse_loss(recons_x, x)\n",
+    "    # Since: W is shape of N_hidden x N. So, we do not need to transpose it as\n",
+    "    # opposed to #1\n",
+    "    dh = h * (1 - h) # Hadamard product produces size N_batch x N_hidden\n",
+    "    # Sum through the input dimension to improve efficiency, as suggested in #1\n",
+    "    w_sum = torch.sum(Variable(W)**2, dim=1)\n",
+    "    # unsqueeze to avoid issues with torch.mv\n",
+    "    w_sum = w_sum.unsqueeze(1) # shape N_hidden x 1\n",
+    "    contractive_loss = torch.sum(torch.mm(dh**2, w_sum), 0)\n",
+    "    return mse + contractive_loss.mul_(lam)\n",
+    "\n",
+    "class FirstOrderNetwork(nn.Module):\n",
+    "    def __init__(self, hidden_units, data_factor, use_gelu):\n",
+    "        \"\"\"\n",
+    "        Initializes the FirstOrderNetwork with specific configurations.\n",
+    "\n",
+    "        Parameters:\n",
+    "        - hidden_units (int): The number of units in the hidden layer.\n",
+    "        - data_factor (int): Factor to scale the amount of data processed.\n",
+    "                             A factor of 1 indicates the default data amount,\n",
+    "                             while 10 indicates 10 times the default amount.\n",
+    "        - use_gelu (bool): Flag to use GELU (True) or ReLU (False) as the activation function.\n",
+    "        \"\"\"\n",
+    "        super(FirstOrderNetwork, self).__init__()\n",
+    "\n",
+    "        # Define the encoder, hidden, and decoder layers with specified units\n",
+    "\n",
+    "        self.fc1 = nn.Linear(100, hidden_units, bias = False) # Encoder\n",
+    "        self.hidden= nn.Linear(hidden_units, hidden_units, bias = False) # Hidden\n",
+    "        self.fc2 = nn.Linear(hidden_units, 100, bias = False) # Decoder\n",
+    "\n",
+    "        self.relu = nn.ReLU()\n",
+    "        self.sigmoid = nn.Sigmoid()\n",
+    "\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting\n",
+    "        self.dropout = nn.Dropout(0.1)\n",
+    "\n",
+    "        # Set the data factor\n",
+    "        self.data_factor = data_factor\n",
+    "\n",
+    "        # Other activation functions for various purposes\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize network weights\n",
+    "        self.initialize_weights()\n",
+    "\n",
+    "    def initialize_weights(self):\n",
+    "        \"\"\"Initializes weights of the encoder, hidden, and decoder layers uniformly.\"\"\"\n",
+    "        init.uniform_(self.fc1.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.fc2.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.hidden.weight, -1.0, 1.0)\n",
+    "\n",
+    "    def encoder(self, x):\n",
+    "      h1 = self.dropout(self.relu(self.fc1(x.view(-1, 100))))\n",
+    "      return h1\n",
+    "\n",
+    "    def decoder(self,z):\n",
+    "      #h2 = self.relu(self.hidden(z))\n",
+    "      h2 = self.sigmoid(self.fc2(z))\n",
+    "      return h2\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "      \"\"\"\n",
+    "      Defines the forward pass through the network.\n",
+    "\n",
+    "      Parameters:\n",
+    "      - x (Tensor): The input tensor to the network.\n",
+    "\n",
+    "      Returns:\n",
+    "      - Tensor: The output of the network after passing through the layers and activations.\n",
+    "      \"\"\"\n",
+    "      h1 = self.encoder(x)\n",
+    "      h2 = self.decoder(h1)\n",
+    "\n",
+    "      return h1 , h2\n",
+    "\n",
+    "class SecondOrderNetwork(nn.Module):\n",
+    "    def __init__(self, use_gelu):\n",
+    "        super(SecondOrderNetwork, self).__init__()\n",
+    "        # Define a linear layer for comparing the difference between input and output of the first-order network\n",
+    "        self.comparison_layer = nn.Linear(100, 100)\n",
+    "\n",
+    "        # Linear layer for determining wagers, mapping from 100 features to a single output\n",
+    "        self.wager = nn.Linear(100, 1)\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting by randomly setting input units to 0 with a probability of 0.5 during training\n",
+    "        self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "        # Select activation function based on the `use_gelu` flag\n",
+    "        self.activation = torch.relu\n",
+    "\n",
+    "        # Additional activation functions for potential use in network operations\n",
+    "        self.sigmoid = torch.sigmoid\n",
+    "\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize the weights of the network\n",
+    "        self._init_weights()\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def forward(self, first_order_input, first_order_output):\n",
+    "        # Calculate the difference between the first-order input and output\n",
+    "        comparison_matrix = first_order_input - first_order_output\n",
+    "\n",
+    "        #Another option is to directly calculate the per unit MSE to use as input for the comparator matrix\n",
+    "        #comparison_matrix = nn.MSELoss(reduction='none')(first_order_output, first_order_input)\n",
+    "\n",
+    "        # Pass the difference through the comparison layer and apply the chosen activation function\n",
+    "        comparison_out=self.dropout(self.activation(self.comparison_layer(comparison_matrix)))\n",
+    "\n",
+    "        # Calculate the wager value, applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        wager = self.sigmoid(self.wager(comparison_out))\n",
+    "\n",
+    "        return wager\n",
+    "\n",
+    "def initialize_global():\n",
+    "    global Input_Size_1, Hidden_Size_1, Output_Size_1, Input_Size_2\n",
+    "    global num_units, patterns_number\n",
+    "    global learning_rate_2, momentum, temperature , Threshold\n",
+    "    global First_set, Second_set, Third_set\n",
+    "    global First_set_targets, Second_set_targets, Third_set_targets\n",
+    "    global epoch_list, epoch_1_order, epoch_2_order, patterns_matrix1\n",
+    "    global testing_graph_names\n",
+    "\n",
+    "    global optimizer ,n_epochs , learning_rate_1\n",
+    "    learning_rate_1 = 0.5\n",
+    "    n_epochs = 100\n",
+    "    optimizer=\"ADAMAX\"\n",
+    "\n",
+    "    # Network sizes\n",
+    "    Input_Size_1 = 100\n",
+    "    Hidden_Size_1 = 60\n",
+    "    Output_Size_1 = 100\n",
+    "    Input_Size_2 = 100\n",
+    "\n",
+    "    # Patterns\n",
+    "    num_units = 100\n",
+    "    patterns_number = 200\n",
+    "\n",
+    "    # Pre-training and hyperparameters\n",
+    "    learning_rate_2 = 0.1\n",
+    "    momentum = 0.9\n",
+    "    temperature = 1.0\n",
+    "    Threshold=0.5\n",
+    "\n",
+    "    # Testing\n",
+    "    First_set = []\n",
+    "    Second_set = []\n",
+    "    Third_set = []\n",
+    "    First_set_targets = []\n",
+    "    Second_set_targets = []\n",
+    "    Third_set_targets = []\n",
+    "\n",
+    "    # Graphic of pretraining\n",
+    "    epoch_list = list(range(1, n_epochs + 1))\n",
+    "    epoch_1_order = np.zeros(n_epochs)\n",
+    "    epoch_2_order = np.zeros(n_epochs)\n",
+    "    patterns_matrix1 =  torch.zeros((n_epochs, patterns_number), device=device)  # Initialize patterns_matrix as a PyTorch tensor on the GPU\n",
+    "\n",
+    "def compute_metrics(TP, TN, FP, FN):\n",
+    "    \"\"\"Compute precision, recall, F1 score, and accuracy.\"\"\"\n",
+    "    precision = round(TP / (TP + FP), 2) if (TP + FP) > 0 else 0\n",
+    "    recall = round(TP / (TP + FN), 2) if (TP + FN) > 0 else 0\n",
+    "    f1_score = round(2 * (precision * recall) / (precision + recall), 2) if (precision + recall) > 0 else 0\n",
+    "    accuracy = round((TP + TN) / (TP + TN + FP + FN), 2) if (TP + TN + FP + FN) > 0 else 0\n",
+    "    return precision, recall, f1_score, accuracy\n",
+    "\n",
+    "# define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "def prepare_pre_training(hidden,factor,gelu,stepsize, gam):\n",
+    "\n",
+    "  first_order_network = FirstOrderNetwork(hidden, factor, gelu).to(device)\n",
+    "  second_order_network = SecondOrderNetwork(gelu).to(device)\n",
+    "\n",
+    "  criterion_1 = CAE_loss\n",
+    "  criterion_2 = nn.BCELoss(size_average = False)\n",
+    "\n",
+    "\n",
+    "  if optimizer == \"ADAM\":\n",
+    "    optimizer_1 = optim.Adam(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.Adam(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"SGD\":\n",
+    "    optimizer_1 = optim.SGD(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.SGD(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"SWATS\":\n",
+    "    optimizer_1 = optim2.SWATS(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim2.SWATS(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"ADAMW\":\n",
+    "    optimizer_1 = optim.AdamW(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.AdamW(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"RMS\":\n",
+    "    optimizer_1 = optim.RMSprop(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.RMSprop(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  elif optimizer == \"ADAMAX\":\n",
+    "    optimizer_1 = optim.Adamax(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "    optimizer_2 = optim.Adamax(second_order_network.parameters(), lr=learning_rate_2)\n",
+    "\n",
+    "  # Learning rate schedulers\n",
+    "  scheduler_1 = StepLR(optimizer_1, step_size=stepsize, gamma=gam)\n",
+    "  scheduler_2 = StepLR(optimizer_2, step_size=stepsize, gamma=gam)\n",
+    "\n",
+    "  return first_order_network, second_order_network, criterion_1 , criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2\n",
+    "\n",
+    "def title(string):\n",
+    "    # Enable XKCD plot styling\n",
+    "    with plt.xkcd():\n",
+    "        # Create a figure and an axes.\n",
+    "        fig, ax = plt.subplots()\n",
+    "\n",
+    "        # Create a rectangle patch with specified dimensions and styles\n",
+    "        rectangle = patches.Rectangle((0.05, 0.1), 0.9, 0.4, linewidth=1, edgecolor='r', facecolor='blue', alpha=0.5)\n",
+    "        ax.add_patch(rectangle)\n",
+    "\n",
+    "        # Place text inside the rectangle, centered\n",
+    "        plt.text(0.5, 0.3, string, horizontalalignment='center', verticalalignment='center', fontsize=26, color='white')\n",
+    "\n",
+    "        # Set plot limits\n",
+    "        ax.set_xlim(0, 1)\n",
+    "        ax.set_ylim(0, 1)\n",
+    "\n",
+    "        # Disable axis display\n",
+    "        ax.axis('off')\n",
+    "\n",
+    "        # Display the plot\n",
+    "        plt.show()\n",
+    "\n",
+    "        # Close the figure to free up memory\n",
+    "        plt.close(fig)\n",
+    "\n",
+    "# Function to configure the training environment and load the models\n",
+    "def get_test_patterns(factor):\n",
+    "    \"\"\"\n",
+    "    Configures the training environment by saving the state of the given models and loading them back.\n",
+    "    Initializes testing patterns for evaluation.\n",
+    "\n",
+    "    Returns:\n",
+    "    - Tuple of testing patterns, number of samples in the testing patterns\n",
+    "    \"\"\"\n",
+    "    # Generating testing patterns for three different sets\n",
+    "    first_set, first_set_targets = create_patterns(0,factor)\n",
+    "    second_set, second_set_targets = create_patterns(1,factor)\n",
+    "    third_set, third_set_targets = create_patterns(2,factor)\n",
+    "\n",
+    "    # Aggregate testing patterns and their targets for ease of access\n",
+    "    testing_patterns = [[first_set, first_set_targets], [second_set, second_set_targets], [third_set, third_set_targets]]\n",
+    "\n",
+    "    # Determine the number of samples from the first set (assumed consistent across all sets)\n",
+    "    n_samples = len(testing_patterns[0][0])\n",
+    "\n",
+    "    return testing_patterns, n_samples\n",
+    "\n",
+    "# Function to test the model using the configured testing patterns\n",
+    "def plot_input_output(input_data, output_data, index):\n",
+    "    fig, axes = plt.subplots(1, 2, figsize=(10, 6))\n",
+    "\n",
+    "    # Plot input data\n",
+    "    im1 = axes[0].imshow(input_data.cpu().numpy(), aspect='auto', cmap='viridis')\n",
+    "    axes[0].set_title('Input')\n",
+    "    fig.colorbar(im1, ax=axes[0])\n",
+    "\n",
+    "    # Plot output data\n",
+    "    im2 = axes[1].imshow(output_data.cpu().numpy(), aspect='auto', cmap='viridis')\n",
+    "    axes[1].set_title('Output')\n",
+    "    fig.colorbar(im2, ax=axes[1])\n",
+    "\n",
+    "    plt.suptitle(f'Testing Pattern {index+1}')\n",
+    "    plt.show()\n",
+    "\n",
+    "# Function to test the model using the configured testing patterns\n",
+    "# Function to test the model using the configured testing patterns\n",
+    "def testing(testing_patterns, n_samples, loaded_model, loaded_model_2,factor):\n",
+    "\n",
+    "    def generate_chance_level(shape):\n",
+    "      chance_level = np.random.rand(*shape).tolist()\n",
+    "      return chance_level\n",
+    "\n",
+    "    results_for_plotting = []\n",
+    "    max_values_output_first_order = []\n",
+    "    max_indices_output_first_order = []\n",
+    "    max_values_patterns_tensor = []\n",
+    "    max_indices_patterns_tensor = []\n",
+    "    f1_scores_wager = []\n",
+    "\n",
+    "    mse_losses_indices = []\n",
+    "    mse_losses_values = []\n",
+    "    discrimination_performances = []\n",
+    "\n",
+    "\n",
+    "\n",
+    "    # Iterate through each set of testing patterns and targets\n",
+    "    for i in range(len(testing_patterns)):\n",
+    "        with torch.no_grad():  # Ensure no gradients are computed during testing\n",
+    "\n",
+    "            #For low vision the stimulus threshold was set to 0.3 as can seen in the generate_patters function\n",
+    "            threshold=0.5\n",
+    "            if i==2:\n",
+    "                threshold=0.15\n",
+    "\n",
+    "            # Obtain output from the first order model\n",
+    "            input_data = testing_patterns[i][0]\n",
+    "            hidden_representation,  output_first_order = loaded_model(input_data)\n",
+    "            output_second_order = loaded_model_2(input_data, output_first_order)\n",
+    "\n",
+    "            delta=100*factor\n",
+    "\n",
+    "            print(\"driscriminator\")\n",
+    "            print((output_first_order[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean())\n",
+    "            discrimination_performance = round((output_first_order[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean().item(), 2)\n",
+    "            discrimination_performances.append(discrimination_performance)\n",
+    "\n",
+    "\n",
+    "            chance_level = torch.Tensor( generate_chance_level((200*factor,100))).to(device)\n",
+    "            discrimination_random= round((chance_level[delta:].argmax(dim=1) == input_data[delta:].argmax(dim=1)).to(float).mean().item(), 2)\n",
+    "            print(\"chance level\" , discrimination_random)\n",
+    "\n",
+    "\n",
+    "\n",
+    "            #count all patterns in the dataset\n",
+    "            wagers = output_second_order[delta:].cpu()\n",
+    "\n",
+    "            _, targets_2 = torch.max(testing_patterns[i][1], 1)\n",
+    "            targets_2 = targets_2[delta:].cpu()\n",
+    "\n",
+    "            # Convert targets to binary classification for wagering scenario\n",
+    "            targets_2 = (targets_2 > 0).int()\n",
+    "\n",
+    "            # Convert tensors to NumPy arrays for metric calculations\n",
+    "            predicted_np = wagers.numpy().flatten()\n",
+    "            targets_2_np = targets_2.numpy()\n",
+    "\n",
+    "            #print(\"number of targets,\" , len(targets_2_np))\n",
+    "\n",
+    "            print(predicted_np)\n",
+    "            print(targets_2_np)\n",
+    "\n",
+    "            # Calculate True Positives, True Negatives, False Positives, and False Negatives\n",
+    "            TP = np.sum((predicted_np >  threshold) & (targets_2_np > threshold))\n",
+    "            TN = np.sum((predicted_np <  threshold ) & (targets_2_np < threshold))\n",
+    "            FP = np.sum((predicted_np >  threshold) & (targets_2_np <  threshold))\n",
+    "            FN = np.sum((predicted_np <  threshold) & (targets_2_np >  threshold))\n",
+    "\n",
+    "            # Compute precision, recall, F1 score, and accuracy for both high and low wager scenarios\n",
+    "            precision_h, recall_h, f1_score_h, accuracy_h = compute_metrics(TP, TN, FP, FN)\n",
+    "\n",
+    "            f1_scores_wager.append(f1_score_h)\n",
+    "\n",
+    "            # Collect results for plotting\n",
+    "            results_for_plotting.append({\n",
+    "                \"counts\": [[TP, FP, TP + FP]],\n",
+    "                \"metrics\": [[precision_h, recall_h, f1_score_h, accuracy_h]],\n",
+    "                \"title_results\": f\"Results Table - Set {i+1}\",\n",
+    "                \"title_metrics\": f\"Metrics Table - Set {i+1}\"\n",
+    "            })\n",
+    "\n",
+    "            # Plot input and output of the first-order network\n",
+    "            plot_input_output(input_data, output_first_order, i)\n",
+    "\n",
+    "            max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
+    "            max_inds_out[max_vals_out == 0] = 0\n",
+    "            max_values_output_first_order.append(max_vals_out.tolist())\n",
+    "            max_indices_output_first_order.append(max_inds_out.tolist())\n",
+    "\n",
+    "            max_vals_pat, max_inds_pat = torch.max(input_data[100:], dim=1)\n",
+    "            max_inds_pat[max_vals_pat == 0] = 0\n",
+    "            max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
+    "            max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
+    "\n",
+    "            fig, axs = plt.subplots(1, 2, figsize=(15, 5))\n",
+    "\n",
+    "            # Scatter plot of indices: patterns_tensor vs. output_first_order\n",
+    "            axs[0].scatter(max_indices_patterns_tensor[i], max_indices_output_first_order[i], alpha=0.5)\n",
+    "            axs[0].set_title(f'Stimuli location: Condition {i+1} - First Order Input vs. First Order Output')\n",
+    "            axs[0].set_xlabel('First Order Input Indices')\n",
+    "            axs[0].set_ylabel('First Order Output Indices')\n",
+    "\n",
+    "            # Add quadratic fit to scatter plot\n",
+    "            x_indices = max_indices_patterns_tensor[i]\n",
+    "            y_indices = max_indices_output_first_order[i]\n",
+    "            y_pred_indices = perform_quadratic_regression(x_indices, y_indices)\n",
+    "            axs[0].plot(x_indices, y_pred_indices, color='skyblue')\n",
+    "\n",
+    "\n",
+    "            # Calculate MSE loss for indices\n",
+    "            mse_loss_indices = np.mean((np.array(x_indices) - np.array(y_indices)) ** 2)\n",
+    "            mse_losses_indices.append(mse_loss_indices)\n",
+    "\n",
+    "            # Scatter plot of values: patterns_tensor vs. output_first_order\n",
+    "            axs[1].scatter(max_values_patterns_tensor[i], max_values_output_first_order[i], alpha=0.5)\n",
+    "            axs[1].set_title(f'Stimuli Values: Condition {i+1} - First Order Input vs. First Order Output')\n",
+    "            axs[1].set_xlabel('First Order Input Values')\n",
+    "            axs[1].set_ylabel('First Order Output Values')\n",
+    "\n",
+    "            # Add quadratic fit to scatter plot\n",
+    "            x_values = max_values_patterns_tensor[i]\n",
+    "            y_values = max_values_output_first_order[i]\n",
+    "            y_pred_values = perform_quadratic_regression(x_values, y_values)\n",
+    "            axs[1].plot(x_values, y_pred_values, color='skyblue')\n",
+    "\n",
+    "            # Calculate MSE loss for values\n",
+    "            mse_loss_values = np.mean((np.array(x_values) - np.array(y_values)) ** 2)\n",
+    "            mse_losses_values.append(mse_loss_values)\n",
+    "\n",
+    "            plt.tight_layout()\n",
+    "            plt.show()\n",
+    "\n",
+    "    return f1_scores_wager, mse_losses_indices , mse_losses_values, discrimination_performances, results_for_plotting\n",
+    "\n",
+    "def generate_patterns(patterns_number, num_units, factor, condition = 0):\n",
+    "    \"\"\"\n",
+    "    Generates patterns and targets for training the networks\n",
+    "\n",
+    "    # patterns_number: Number of patterns to generate\n",
+    "    # num_units: Number of units in each pattern\n",
+    "    # pattern: 0: superthreshold, 1: subthreshold, 2: low vision\n",
+    "    # Returns lists of patterns, stimulus present/absent indicators, and second order targets\n",
+    "    \"\"\"\n",
+    "\n",
+    "    patterns_number= patterns_number*factor\n",
+    "\n",
+    "    patterns = []  # Store generated patterns\n",
+    "    stim_present = []  # Indicators for when a stimulus is present in the pattern\n",
+    "    stim_absent = []  # Indicators for when no stimulus is present\n",
+    "    order_2_pr = []  # Second order network targets based on the presence or absence of stimulus\n",
+    "\n",
+    "    if condition == 0:\n",
+    "        random_limit= 0.0\n",
+    "        baseline = 0\n",
+    "        multiplier = 1\n",
+    "\n",
+    "    if condition == 1:\n",
+    "        random_limit= 0.02\n",
+    "        baseline = 0.0012\n",
+    "        multiplier = 1\n",
+    "\n",
+    "    if condition == 2:\n",
+    "        random_limit= 0.02\n",
+    "        baseline = 0.0012\n",
+    "        multiplier = 0.3\n",
+    "\n",
+    "    # Generate patterns, half noise and half potential stimuli\n",
+    "    for i in range(patterns_number):\n",
+    "\n",
+    "        # First half: Noise patterns\n",
+    "        if i < patterns_number // 2:\n",
+    "\n",
+    "            pattern = multiplier * np.random.uniform(0.0, random_limit, num_units) + baseline # Generate a noise pattern\n",
+    "            patterns.append(pattern)\n",
+    "            stim_present.append(np.zeros(num_units))  # Stimulus absent\n",
+    "            order_2_pr.append([0.0 , 1.0])  # No stimulus, low wager\n",
+    "\n",
+    "        # Second half: Stimulus patterns\n",
+    "        else:\n",
+    "            stimulus_number = random.randint(0, num_units - 1) # Choose a unit for potential stimulus\n",
+    "            pattern = np.random.uniform(0.0, random_limit, num_units) + baseline\n",
+    "            pattern[stimulus_number] = np.random.uniform(0.0, 1.0) * multiplier   # Set stimulus intensity\n",
+    "\n",
+    "            patterns.append(pattern)\n",
+    "            present = np.zeros(num_units)\n",
+    "            # Determine if stimulus is above discrimination threshold\n",
+    "            if pattern[stimulus_number] >= multiplier/2:\n",
+    "                order_2_pr.append([1.0 , 0.0])  # Stimulus detected, high wager\n",
+    "                present[stimulus_number] = 1.0\n",
+    "            else:\n",
+    "                order_2_pr.append([0.0 , 1.0])  # Stimulus not detected, low wager\n",
+    "                present[stimulus_number] = 0.0\n",
+    "\n",
+    "            stim_present.append(present)\n",
+    "\n",
+    "\n",
+    "    patterns_tensor = torch.Tensor(patterns).to(device).requires_grad_(True)\n",
+    "    stim_present_tensor = torch.Tensor(stim_present).to(device).requires_grad_(True)\n",
+    "    stim_absent_tensor= torch.Tensor(stim_absent).to(device).requires_grad_(True)\n",
+    "    order_2_tensor = torch.Tensor(order_2_pr).to(device).requires_grad_(True)\n",
+    "\n",
+    "    return patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor\n",
+    "\n",
+    "def create_patterns(stimulus,factor):\n",
+    "    \"\"\"\n",
+    "    Generates neural network input patterns based on specified stimulus conditions.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - stimulus (int): Determines the type of patterns to generate.\n",
+    "                      Acceptable values:\n",
+    "                      - 0: Suprathreshold stimulus\n",
+    "                      - 1: Subthreshold stimulus\n",
+    "                      - 2: Low vision condition\n",
+    "\n",
+    "    Returns:\n",
+    "    - torch.Tensor: Tensor of generated patterns.\n",
+    "    - torch.Tensor: Tensor of target values corresponding to the generated patterns.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Generate initial patterns and target tensors for base condition.\n",
+    "\n",
+    "    patterns_tensor, stim_present_tensor, _, _ = generate_patterns(patterns_number, num_units ,factor, stimulus)\n",
+    "    # Convert pattern tensors for processing on specified device (CPU/GPU).\n",
+    "    patterns = torch.Tensor(patterns_tensor).to(device)\n",
+    "    targets = torch.Tensor(stim_present_tensor).to(device)\n",
+    "\n",
+    "    return patterns, targets\n",
+    "\n",
+    "def pre_train(first_order_network, second_order_network, criterion_1,  criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2, factor, meta):\n",
+    "    \"\"\"\n",
+    "    Conducts pre-training for first-order and second-order networks.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - first_order_network (torch.nn.Module): Network for basic input-output mapping.\n",
+    "    - second_order_network (torch.nn.Module): Network for decision-making based on the first network's output.\n",
+    "    - criterion_1, criterion_2 (torch.nn): Loss functions for the respective networks.\n",
+    "    - optimizer_1, optimizer_2 (torch.optim): Optimizers for the respective networks.\n",
+    "    - scheduler_1, scheduler_2 (torch.optim.lr_scheduler): Schedulers for learning rate adjustment.\n",
+    "    - factor (float): Parameter influencing data augmentation or pattern generation.\n",
+    "    - meta (bool): Flag indicating the use of meta-learning strategies.\n",
+    "\n",
+    "    Returns:\n",
+    "    Tuple containing updated networks and epoch-wise loss records.\n",
+    "\n",
+    "    \"\"\"\n",
+    "    def get_num_args(func):\n",
+    "      return func.__code__.co_argcount\n",
+    "\n",
+    "    max_values_output_first_order = []\n",
+    "    max_indices_output_first_order = []\n",
+    "    max_values_patterns_tensor = []\n",
+    "    max_indices_patterns_tensor = []\n",
+    "\n",
+    "    epoch_1_order = np.zeros(n_epochs)\n",
+    "    epoch_2_order = np.zeros(n_epochs)\n",
+    "\n",
+    "    for epoch in range(n_epochs):\n",
+    "        # Generate training patterns and targets for each epoch\n",
+    "        patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor = generate_patterns(patterns_number, num_units,factor, 0)\n",
+    "\n",
+    "        # Forward pass through the first-order network\n",
+    "        hidden_representation , output_first_order = first_order_network(patterns_tensor)\n",
+    "\n",
+    "        patterns_tensor=patterns_tensor.requires_grad_(True)\n",
+    "        output_first_order=output_first_order.requires_grad_(True)\n",
+    "\n",
+    "        # Get max values and indices for output_first_order\n",
+    "        max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
+    "        max_inds_out[max_vals_out == 0] = 0\n",
+    "        max_values_output_first_order.append(max_vals_out.tolist())\n",
+    "        max_indices_output_first_order.append(max_inds_out.tolist())\n",
+    "\n",
+    "        # Get max values and indices for patterns_tensor\n",
+    "        max_vals_pat, max_inds_pat = torch.max(patterns_tensor[100:], dim=1)\n",
+    "        max_inds_pat[max_vals_pat == 0] = 0\n",
+    "        max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
+    "        max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
+    "\n",
+    "        optimizer_1.zero_grad()\n",
+    "\n",
+    "        # Conditionally execute the second-order network pass and related operations\n",
+    "        if meta:\n",
+    "\n",
+    "            # Forward pass through the second-order network with inputs from the first-order network\n",
+    "            output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
+    "\n",
+    "            # Calculate the loss for the second-order network (wagering decision based on comparison)\n",
+    "            loss_2 = criterion_2(output_second_order.squeeze(), order_2_tensor[:, 0])\n",
+    "\n",
+    "            optimizer_2.zero_grad()\n",
+    "\n",
+    "\n",
+    "            # Backpropagate the second-order network's loss\n",
+    "            loss_2.backward(retain_graph=True)  # Allows further backpropagation for loss_1 after loss_2\n",
+    "\n",
+    "            # Update second-order network weights\n",
+    "            optimizer_2.step()\n",
+    "\n",
+    "            scheduler_2.step()\n",
+    "\n",
+    "            epoch_2_order[epoch] = loss_2.item()\n",
+    "        else:\n",
+    "            # Skip computations for the second-order network\n",
+    "            with torch.no_grad():\n",
+    "                # Potentially forward pass through the second-order network without tracking gradients\n",
+    "                output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
+    "\n",
+    "        # Calculate the loss for the first-order network (accuracy of stimulus representation)\n",
+    "\n",
+    "        num_args = get_num_args(criterion_1)\n",
+    "\n",
+    "        if num_args == 2:\n",
+    "          loss_1 = criterion_1(  output_first_order , stim_present_tensor )\n",
+    "        else:\n",
+    "          W = first_order_network.state_dict()['fc1.weight']\n",
+    "          loss_1 = criterion_1( W, stim_present_tensor.view(-1, 100), output_first_order,\n",
+    "                             hidden_representation, lam )\n",
+    "\n",
+    "        # Backpropagate the first-order network's loss\n",
+    "        loss_1.backward()\n",
+    "\n",
+    "        # Update first-order network weights\n",
+    "        optimizer_1.step()\n",
+    "\n",
+    "        # Reset first-order optimizer gradients to zero for the next iteration\n",
+    "\n",
+    "        # Update the first-order scheduler\n",
+    "        scheduler_1.step()\n",
+    "\n",
+    "        epoch_1_order[epoch] = loss_1.item()\n",
+    "        #epoch_1_order[epoch] = loss_location.item()\n",
+    "\n",
+    "    return first_order_network, second_order_network, epoch_1_order, epoch_2_order , (max_values_output_first_order[-1],\n",
+    "            max_indices_output_first_order[-1],\n",
+    "            max_values_patterns_tensor[-1],\n",
+    "            max_indices_patterns_tensor[-1])\n",
+    "\n",
+    "def HOSS_evaluate(X, mu, Sigma, Aprior, Wprior):\n",
+    "    \"\"\"\n",
+    "    Inference on 2D Bayes net for asymmetric inference on presence vs. absence.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Initialise variables and conditional prob tables\n",
+    "    p_A = np.array([1 - Aprior, Aprior])  # prior on awareness state A\n",
+    "    p_W_a1 = np.append(0, Wprior)  # likelihood of world states W given aware, first entry is absence\n",
+    "    p_W_a0 = np.append(1, np.zeros(len(Wprior)))  # likelihood of world states W given unaware, first entry is absence\n",
+    "    p_W = (p_W_a1 + p_W_a0) / 2  # prior on W marginalising over A (for KL)\n",
+    "\n",
+    "    # Compute likelihood of observed X for each possible W (P(X|mu_w, Sigma))\n",
+    "    lik_X_W = np.array([multivariate_normal.pdf(X, mean=mu_i, cov=Sigma) for mu_i in mu])\n",
+    "    p_X_W = lik_X_W / lik_X_W.sum()  # normalise to get P(X|W)\n",
+    "\n",
+    "    # Combine with likelihood of each world state w given awareness state A\n",
+    "    lik_W_A = np.vstack((p_X_W * p_W_a0 * p_A[0], p_X_W * p_W_a1 * p_A[1]))\n",
+    "    post_A = lik_W_A.sum(axis=1)  # sum over W\n",
+    "    post_A = post_A / post_A.sum()  # normalise\n",
+    "\n",
+    "    # Posterior over W (P(W|X=x) marginalising over A)\n",
+    "    post_W = lik_W_A.sum(axis=0)  # sum over A\n",
+    "    post_W = post_W / post_W.sum()  # normalise\n",
+    "\n",
+    "    # KL divergences\n",
+    "    KL_W = (post_W * np.log(post_W / p_W)).sum()\n",
+    "    KL_A = (post_A * np.log(post_A / p_A)).sum()\n",
+    "\n",
+    "    return post_W, post_A, KL_W, KL_A"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4e5761b4",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Plotting functions\n",
+    "# @markdown\n",
+    "\n",
+    "def plot_testing(results_seed, discrimination_seed, seeds, title):\n",
+    "    print(results_seed)\n",
+    "    print(discrimination_seed)\n",
+    "\n",
+    "    Testing_graph_names = [\"Suprathreshold stimulus\", \"Subthreshold stimulus\", \"Low Vision\"]\n",
+    "\n",
+    "    fig, ax = plt.subplots(figsize=(14, len(results_seed[0]) * 2 + 2))  # Adjusted for added header space\n",
+    "    ax.axis('off')\n",
+    "    ax.axis('tight')\n",
+    "\n",
+    "    # Define column labels\n",
+    "    col_labels = [\"Scenario\", \"F1 SCORE\\n(2nd order network)\", \"RECALL\\n(2nd order network)\", \"PRECISION\\n(2nd order network)\", \"Discrimination Performance\\n(1st order network)\", \"ACCURACY\\n(2nd order network)\"]\n",
+    "\n",
+    "    # Initialize list to hold all rows of data including headers\n",
+    "    full_data = []\n",
+    "\n",
+    "    # Calculate averages and standard deviations\n",
+    "    for i in range(len(results_seed[0])):\n",
+    "        metrics_list = [result[i][\"metrics\"][0] for result in results_seed]  # Collect metrics for each seed\n",
+    "        discrimination_list = [discrimination_seed[j][i] for j in range(seeds)]\n",
+    "\n",
+    "        # Calculate averages and standard deviations for metrics\n",
+    "        avg_metrics = np.mean(metrics_list, axis=0).tolist()\n",
+    "        std_metrics = np.std(metrics_list, axis=0).tolist()\n",
+    "\n",
+    "        # Calculate average and standard deviation for discrimination performance\n",
+    "        avg_discrimination = np.mean(discrimination_list)\n",
+    "        std_discrimination = np.std(discrimination_list)\n",
+    "\n",
+    "        # Format the row with averages and standard deviations\n",
+    "        row = [\n",
+    "            Testing_graph_names[i],\n",
+    "            f\"{avg_metrics[2]:.2f} ± {std_metrics[2]:.2f}\",  # F1 SCORE\n",
+    "            f\"{avg_metrics[1]:.2f} ± {std_metrics[1]:.2f}\",  # RECALL\n",
+    "            f\"{avg_metrics[0]:.2f} ± {std_metrics[0]:.2f}\",  # PRECISION\n",
+    "            f\"{avg_discrimination:.2f} ± {std_discrimination:.2f}\",  # Discrimination Performance\n",
+    "            f\"{avg_metrics[3]:.2f} ± {std_metrics[3]:.2f}\"  # ACCURACY\n",
+    "        ]\n",
+    "        full_data.append(row)\n",
+    "\n",
+    "    # Extract metric values for color scaling (excluding the first and last columns which are text)\n",
+    "    metric_values = np.array([[float(x.split(\" ± \")[0]) for x in row[1:]] for row in full_data])  # Convert to float for color scaling\n",
+    "    max_value = np.max(metric_values)\n",
+    "    colors = metric_values / max_value  # Normalize for color mapping\n",
+    "\n",
+    "    # Prepare colors for all cells, defaulting to white for non-metric cells\n",
+    "    cell_colors = [[\"white\"] * len(col_labels) for _ in range(len(full_data))]\n",
+    "    for i, row in enumerate(colors):\n",
+    "        cell_colors[i][1] = plt.cm.RdYlGn(row[0])\n",
+    "        cell_colors[i][2] = plt.cm.RdYlGn(row[1])\n",
+    "        cell_colors[i][3] = plt.cm.RdYlGn(row[2])\n",
+    "        cell_colors[i][5] = plt.cm.RdYlGn(row[3])  # Adding color for accuracy\n",
+    "\n",
+    "    # Adding color for discrimination performance\n",
+    "    discrimination_colors = colors[:, 3]\n",
+    "    for i, dp_color in enumerate(discrimination_colors):\n",
+    "        cell_colors[i][4] = plt.cm.RdYlGn(dp_color)\n",
+    "\n",
+    "    # Create the main table with cell colors\n",
+    "    table = ax.table(cellText=full_data, colLabels=col_labels, loc='center', cellLoc='center', cellColours=cell_colors)\n",
+    "    table.auto_set_font_size(False)\n",
+    "    table.set_fontsize(10)\n",
+    "    table.scale(1.5, 1.5)\n",
+    "\n",
+    "    # Set the height of the header row to be double that of the other rows\n",
+    "    for j, col_label in enumerate(col_labels):\n",
+    "        cell = table[(0, j)]\n",
+    "        cell.set_height(cell.get_height() * 2)\n",
+    "\n",
+    "    # Add chance level table\n",
+    "    chance_level_data = [[\"Chance Level\\nDiscrimination(1st)\", \"Chance Level\\nAccuracy(2nd)\"],\n",
+    "                         [\"0.010\", \"0.50\"]]\n",
+    "\n",
+    "    chance_table = ax.table(cellText=chance_level_data, bbox=[1.0, 0.8, 0.3, 0.1], cellLoc='center', colWidths=[0.1, 0.1])\n",
+    "    chance_table.auto_set_font_size(False)\n",
+    "    chance_table.set_fontsize(10)\n",
+    "    chance_table.scale(1.2, 1.2)\n",
+    "\n",
+    "    # Set the height of the header row to be double that of the other rows in the chance level table\n",
+    "    for j in range(len(chance_level_data[0])):\n",
+    "        cell = chance_table[(0, j)]\n",
+    "        cell.set_height(cell.get_height() * 2)\n",
+    "\n",
+    "    plt.title(title, pad=20, fontsize=16)\n",
+    "    plt.show()\n",
+    "    plt.close(fig)\n",
+    "\n",
+    "\n",
+    "def plot_signal_max_and_indicator(patterns_tensor, plot_title=\"Training Signals\"):\n",
+    "    \"\"\"\n",
+    "    Plots the maximum values of signal units and a binary indicator for max values greater than 0.5.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - patterns_tensor: A tensor containing signals, where each signal is expected to have multiple units.\n",
+    "    \"\"\"\n",
+    "    with plt.xkcd():\n",
+    "\n",
+    "        # Calculate the maximum value of units for each signal within the patterns tensor\n",
+    "        max_values_of_units = patterns_tensor.max(dim=1).values.cpu().numpy()  # Ensure it's on CPU and in NumPy format for plotting\n",
+    "\n",
+    "        # Determine the binary indicators based on the max value being greater than 0.5\n",
+    "        binary_indicators = (max_values_of_units > 0.5).astype(int)\n",
+    "\n",
+    "        # Create a figure with 2 subplots (2 rows, 1 column)\n",
+    "        fig, axs = plt.subplots(2, 1, figsize=(8, 8))\n",
+    "\n",
+    "        fig.suptitle(plot_title, fontsize=16)  # Set the overall title for the plot\n",
+    "\n",
+    "        # First subplot for the maximum values of each signal\n",
+    "        axs[0].plot(range(patterns_tensor.size(0)), max_values_of_units, drawstyle='steps-mid')\n",
+    "        axs[0].set_xlabel('Pattern Number')\n",
+    "        axs[0].set_ylabel('Max Value of Signal Units')\n",
+    "        axs[0].set_ylim(-0.1, 1.1)  # Adjust y-axis limits for clarity\n",
+    "        axs[0].grid(True)\n",
+    "\n",
+    "        # Second subplot for the binary indicators\n",
+    "        axs[1].plot(range(patterns_tensor.size(0)), binary_indicators, drawstyle='steps-mid', color='red')\n",
+    "        axs[1].set_xlabel('Pattern Number')\n",
+    "        axs[1].set_ylabel('Indicator (Max > 0.5) in each signal')\n",
+    "        axs[1].set_ylim(-0.1, 1.1)  # Adjust y-axis limits for clarity\n",
+    "        axs[1].grid(True)\n",
+    "\n",
+    "        plt.tight_layout()\n",
+    "        plt.show()\n",
+    "\n",
+    "\n",
+    "def perform_quadratic_regression(epoch_list, values):\n",
+    "    # Perform quadratic regression\n",
+    "    coeffs = np.polyfit(epoch_list, values, 2)  # Coefficients of the polynomial\n",
+    "    y_pred = np.polyval(coeffs, epoch_list)        # Evaluate the polynomial at the given x values\n",
+    "    return y_pred\n",
+    "\n",
+    "\n",
+    "def pre_train_plots(epoch_1_order, epoch_2_order, title, max_values_indices):\n",
+    "    \"\"\"\n",
+    "    Plots the training progress with regression lines and scatter plots of indices and values of max elements.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - epoch_list (list): List of epoch numbers.\n",
+    "    - epoch_1_order (list): Loss values for the first-order network over epochs.\n",
+    "    - epoch_2_order (list): Loss values for the second-order network over epochs.\n",
+    "    - title (str): Title for the plots.\n",
+    "    - max_values_indices (tuple): Tuple containing lists of max values and indices for both tensors.\n",
+    "    \"\"\"\n",
+    "    (max_values_output_first_order,\n",
+    "     max_indices_output_first_order,\n",
+    "     max_values_patterns_tensor,\n",
+    "     max_indices_patterns_tensor) = max_values_indices\n",
+    "\n",
+    "    # Perform quadratic regression for the loss plots\n",
+    "    epoch_list = list(range(len(epoch_1_order)))\n",
+    "    y_pred1 = perform_quadratic_regression(epoch_list, epoch_1_order)\n",
+    "    y_pred2 = perform_quadratic_regression(epoch_list, epoch_2_order)\n",
+    "\n",
+    "    # Set up the plot with 2 rows and 2 columns\n",
+    "    fig, axs = plt.subplots(2, 2, figsize=(15, 10))\n",
+    "\n",
+    "    # First graph for 1st Order Network\n",
+    "    axs[0, 0].plot(epoch_list, epoch_1_order, linestyle='--', marker='o', color='g')\n",
+    "    axs[0, 0].plot(epoch_list, y_pred1, linestyle='-', color='r', label='Quadratic Fit')\n",
+    "    axs[0, 0].legend(['1st Order Network', 'Quadratic Fit'])\n",
+    "    axs[0, 0].set_title('1st Order Network Loss')\n",
+    "    axs[0, 0].set_xlabel('Epochs - Pretraining Phase')\n",
+    "    axs[0, 0].set_ylabel('Loss')\n",
+    "\n",
+    "    # Second graph for 2nd Order Network\n",
+    "    axs[0, 1].plot(epoch_list, epoch_2_order, linestyle='--', marker='o', color='b')\n",
+    "    axs[0, 1].plot(epoch_list, y_pred2, linestyle='-', color='r', label='Quadratic Fit')\n",
+    "    axs[0, 1].legend(['2nd Order Network', 'Quadratic Fit'])\n",
+    "    axs[0, 1].set_title('2nd Order Network Loss')\n",
+    "    axs[0, 1].set_xlabel('Epochs - Pretraining Phase')\n",
+    "    axs[0, 1].set_ylabel('Loss')\n",
+    "\n",
+    "    # Scatter plot of indices: patterns_tensor vs. output_first_order\n",
+    "    axs[1, 0].scatter(max_indices_patterns_tensor, max_indices_output_first_order, alpha=0.5)\n",
+    "\n",
+    "    # Add quadratic regression line\n",
+    "    indices_regression = perform_quadratic_regression(max_indices_patterns_tensor, max_indices_output_first_order)\n",
+    "    axs[1, 0].plot(max_indices_patterns_tensor, indices_regression, color='skyblue', linestyle='--', label='Quadratic Fit')\n",
+    "\n",
+    "    axs[1, 0].set_title('Stimuli location: First Order Input vs. First Order Output')\n",
+    "    axs[1, 0].set_xlabel('First Order Input Indices')\n",
+    "    axs[1, 0].set_ylabel('First Order Output Indices')\n",
+    "    axs[1, 0].legend()\n",
+    "\n",
+    "    # Scatter plot of values: patterns_tensor vs. output_first_order\n",
+    "    axs[1, 1].scatter(max_values_patterns_tensor, max_values_output_first_order, alpha=0.5)\n",
+    "\n",
+    "    # Add quadratic regression line\n",
+    "    values_regression = perform_quadratic_regression(max_values_patterns_tensor, max_values_output_first_order)\n",
+    "    axs[1, 1].plot(max_values_patterns_tensor, values_regression, color='skyblue', linestyle='--', label='Quadratic Fit')\n",
+    "\n",
+    "    axs[1, 1].set_title('Stimuli Values: First Order Input vs. First Order Output')\n",
+    "    axs[1, 1].set_xlabel('First Order Input Values')\n",
+    "    axs[1, 1].set_ylabel('First Order Output Values')\n",
+    "    axs[1, 1].legend()\n",
+    "\n",
+    "    plt.suptitle(title, fontsize=16, y=1.02)\n",
+    "\n",
+    "    # Display the plots in a 2x2 grid\n",
+    "    plt.tight_layout()\n",
+    "    plt.savefig('Blindsight_Pre_training_Loss_{}.png'.format(title.replace(\" \", \"_\").replace(\"/\", \"_\")), bbox_inches='tight')\n",
+    "    plt.show()\n",
+    "    plt.close(fig)\n",
+    "\n",
+    "# Function to configure the training environment and load the models\n",
+    "def config_training(first_order_network, second_order_network, hidden, factor, gelu):\n",
+    "    \"\"\"\n",
+    "    Configures the training environment by saving the state of the given models and loading them back.\n",
+    "    Initializes testing patterns for evaluation.\n",
+    "\n",
+    "    Parameters:\n",
+    "    - first_order_network: The first order network instance.\n",
+    "    - second_order_network: The second order network instance.\n",
+    "    - hidden: Number of hidden units in the first order network.\n",
+    "    - factor: Factor influencing the network's architecture.\n",
+    "    - gelu: Activation function to be used in the network.\n",
+    "\n",
+    "    Returns:\n",
+    "    - Tuple of testing patterns, number of samples in the testing patterns, and the loaded model instances.\n",
+    "    \"\"\"\n",
+    "    # Paths where the models' states will be saved\n",
+    "    PATH = './cnn1.pth'\n",
+    "    PATH_2 = './cnn2.pth'\n",
+    "\n",
+    "    # Save the weights of the pretrained networks to the specified paths\n",
+    "    torch.save(first_order_network.state_dict(), PATH)\n",
+    "    torch.save(second_order_network.state_dict(), PATH_2)\n",
+    "\n",
+    "    # Generating testing patterns for three different sets\n",
+    "    First_set, First_set_targets = create_patterns(0,factor)\n",
+    "    Second_set, Second_set_targets = create_patterns(1,factor)\n",
+    "    Third_set, Third_set_targets = create_patterns(2,factor)\n",
+    "\n",
+    "    # Aggregate testing patterns and their targets for ease of access\n",
+    "    Testing_patterns = [[First_set, First_set_targets], [Second_set, Second_set_targets], [Third_set, Third_set_targets]]\n",
+    "\n",
+    "    # Determine the number of samples from the first set (assumed consistent across all sets)\n",
+    "    n_samples = len(Testing_patterns[0][0])\n",
+    "\n",
+    "    # Initialize and load the saved states into model instances\n",
+    "    loaded_model = FirstOrderNetwork(hidden, factor, gelu)\n",
+    "    loaded_model_2 = SecondOrderNetwork(gelu)\n",
+    "\n",
+    "    loaded_model.load_state_dict(torch.load(PATH))\n",
+    "    loaded_model_2.load_state_dict(torch.load(PATH_2))\n",
+    "\n",
+    "    # Ensure the models are moved to the appropriate device (CPU/GPU) and set to evaluation mode\n",
+    "    loaded_model.to(device)\n",
+    "    loaded_model_2.to(device)\n",
+    "\n",
+    "    loaded_model.eval()\n",
+    "    loaded_model_2.eval()\n",
+    "\n",
+    "    return Testing_patterns, n_samples, loaded_model, loaded_model_2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de910a5b",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "---\n",
+    "\n",
+    "# Introduction\n",
+    "\n",
+    "This bonus tutorial extends a lot of the content that was covered in Tutorial 1 based around the theme of Consciousness. At the end of Section 2. We discussed and implemented a lot of ideas around first-order models and we briefly mentioned second-order models. In this tutorial, we're going to actually develop some ideas and model the effects of blindsight, the phenomenon we introduced earlier on today, where patients have no conscious experience of sight but are able to navigate around objects (showing that their brains are processing sensory information, but it doesn't reach the level of subjective experience). We first introduce the coding of the first-order model, followed by the second-order model. Then we show you some ways to plot the results from these models.\n",
+    "\n",
+    "After this we end on some further high-level thoughts on the theme of consciousness. \n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76dd7488-6558-4022-8541-22765f2967c6",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "## Section 1: Train a First-Order Network\n",
+    "\n",
+    "This section invites you to engage with a straightforward, auto-generated dataset on blindsight, originally introduced by [Pasquali et al. in 2010](https://www.sciencedirect.com/science/article/abs/pii/S0010027710001794). Blindsight is a fascinating condition where individuals who are cortically blind due to damage in their primary visual cortex can still respond to visual stimuli without conscious perception. This intriguing phenomenon underscores the intricate nature of sensory processing and the brain's ability to process information without conscious awareness."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8c79b0a2-8e12-44ea-a685-bba788f6685d",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Visualize the autogenerated data\n",
+    "factor=2\n",
+    "initialize_global()\n",
+    "set_pre, _ = create_patterns(0,factor)\n",
+    "plot_signal_max_and_indicator(set_pre.detach().cpu(), \"Example - Pre training dataset\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cff70408-8662-43f5-b930-fc2a6ffca323",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "The pre-training dataset for the network consisted of 200 patterns. These were evenly divided: half were purely noise (with unit activations randomly chosen between 0.0 and 0.02), and the other half represented potential stimuli. In the stimulus patterns, 99 out of 100 units had activations ranging between 0.0 and 0.02, with one unique unit having an activation between 0.0 and 1.0."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f45662a-08b4-44a4-89e1-200fc0c9cddb",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "**Testing patterns**\n",
+    "\n",
+    "As we have seen before, the network underwent evaluations under three distinct conditions, each modifying the signal-to-noise ratio in a unique way to explore different degrees and types of blindness.\n",
+    "\n",
+    "Suprathreshold stimulus condition: here, the network was exposed to the identical set of 200 patterns used during pre-training, testing the network's response to familiar inputs.\n",
+    "\n",
+    "Subthreshold stimulus condition (blindsight simulation): this condition aimed to mimic blindsight. It was achieved by introducing a slight noise increment (+0.0012) to every input of the first-order network, barring the one designated as the stimulus. This setup tested the network's ability to discern faint signals amidst noise.\n",
+    "\n",
+    "Low vision condition: to simulate low vision, the activation levels of the stimuli were reduced. Unlike the range from 0.0 to 1.0 used in pre-training, the stimuli's activation levels were adjusted to span from 0.0 to 0.3. This condition examined the network's capability to recognize stimuli with diminished intensity."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "db58d78b-17d8-4651-801a-f06e568a7322",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "factor=2\n",
+    "# Compare your results with the patterns generate below\n",
+    "set_1, _ = create_patterns(0,factor)\n",
+    "set_2, _ = create_patterns(1,factor)\n",
+    "set_3, _ = create_patterns(2,factor)\n",
+    "\n",
+    "# Plot\n",
+    "plot_signal_max_and_indicator(set_1.detach().cpu(), \"Suprathreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_2.detach().cpu(), \"Subthreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_3.detach().cpu(), \"Low Vision dataset\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd5c13e0-75e8-45c2-b1be-70496041364b",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Coding Exercise 1: Building a network for a blindsight situation\n",
+    "\n",
+    "In this activity, we'll construct a neural network model using our auto-generated dataset, focusing on blindsight scenarios. The model will primarily consist of fully connected layers, establishing a straightforward, first-order network. The aim here is to assess the basic network's performance.\n",
+    "\n",
+    "**Steps to follow**\n",
+    "\n",
+    "1. Examine the network architecture: understand the structure of the neural network you're about to work with.\n",
+    "2. Visualize loss metrics: observe and analyze the network's performance during pre-training by visualizing the loss over epochs.\n",
+    "3. Evaluate the model: use the provided code snippets to calculate and interpret the model's accuracy, recall, and F1-score, giving you insight into the network's capabilities.\n",
+    "\n",
+    "**Understanding the process**\n",
+    "\n",
+    "The goal is to gain a thorough comprehension of the network's architecture and to interpret the pre-training results visually. This will provide a clearer picture of the model's potential and limitations.\n",
+    "\n",
+    "The network is designed as a backpropagation autoassociator. It features a 100-unit input layer, directly linked to a 40-unit hidden layer, which in turn connects to a 100-unit output layer. Initial connection weights are set within the range of -1.0 to 1.0 for the first-order network. To mitigate overfitting, dropout is employed within the network architecture. The architecture includes a configurable activation function. This flexibility allows for adjustments and tuning in Activity 3, aiming for optimal model performance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "94d0bcaf-8b49-4e35-b0d2-1b9dcc98b182",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "class FirstOrderNetwork(nn.Module):\n",
+    "    def __init__(self, hidden_units, data_factor, use_gelu):\n",
+    "        \"\"\"\n",
+    "        Initializes the FirstOrderNetwork with specific configurations.\n",
+    "\n",
+    "        Parameters:\n",
+    "        - hidden_units (int): The number of units in the hidden layer.\n",
+    "        - data_factor (int): Factor to scale the amount of data processed.\n",
+    "                             A factor of 1 indicates the default data amount,\n",
+    "                             while 10 indicates 10 times the default amount.\n",
+    "        - use_gelu (bool): Flag to use GELU (True) or ReLU (False) as the activation function.\n",
+    "        \"\"\"\n",
+    "        super(FirstOrderNetwork, self).__init__()\n",
+    "\n",
+    "        # Define the encoder, hidden, and decoder layers with specified units\n",
+    "\n",
+    "        self.fc1 = nn.Linear(100, hidden_units, bias = False) # Encoder\n",
+    "        self.hidden= nn.Linear(hidden_units, hidden_units, bias = False) # Hidden\n",
+    "        self.fc2 = nn.Linear(hidden_units, 100, bias = False) # Decoder\n",
+    "\n",
+    "        self.relu = nn.ReLU()\n",
+    "        self.sigmoid = nn.Sigmoid()\n",
+    "\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting\n",
+    "        self.dropout = nn.Dropout(0.1)\n",
+    "\n",
+    "        # Set the data factor\n",
+    "        self.data_factor = data_factor\n",
+    "\n",
+    "        # Other activation functions for various purposes\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize network weights\n",
+    "        self.initialize_weights()\n",
+    "\n",
+    "    def initialize_weights(self):\n",
+    "        \"\"\"Initializes weights of the encoder, hidden, and decoder layers uniformly.\"\"\"\n",
+    "        init.uniform_(self.fc1.weight, -1.0, 1.0)\n",
+    "\n",
+    "        init.uniform_(self.fc2.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.hidden.weight, -1.0, 1.0)\n",
+    "\n",
+    "    def encoder(self, x):\n",
+    "      h1 = self.dropout(self.relu(self.fc1(x.view(-1, 100))))\n",
+    "      return h1\n",
+    "\n",
+    "    def decoder(self,z):\n",
+    "      #h2 = self.relu(self.hidden(z))\n",
+    "      h2 = self.sigmoid(self.fc2(z))\n",
+    "      return h2\n",
+    "\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "      \"\"\"\n",
+    "      Defines the forward pass through the network.\n",
+    "\n",
+    "      Parameters:\n",
+    "      - x (Tensor): The input tensor to the network.\n",
+    "\n",
+    "      Returns:\n",
+    "      - Tensor: The output of the network after passing through the layers and activations.\n",
+    "      \"\"\"\n",
+    "      h1 = self.encoder(x)\n",
+    "      h2 = self.decoder(h1)\n",
+    "\n",
+    "      return h1 , h2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83e07f1a-540b-4bfa-8e9b-f114319f1f96",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "For now, we will train the first-order network only."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4202ab0d",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "seeds=15\n",
+    "\n",
+    "results_seed=[]\n",
+    "discrimination_seed=[]\n",
+    "\n",
+    "# Hyperparameters\n",
+    "optimizer=\"ADAMAX\"\n",
+    "hidden=40\n",
+    "factor=2\n",
+    "gelu=False\n",
+    "gam=0.98\n",
+    "meta=True\n",
+    "stepsize=25\n",
+    "\n",
+    "for i in range(seeds):\n",
+    "  print(f\"Seed {i}\")\n",
+    "\n",
+    "  # Compare your results with the patterns generate below\n",
+    "  initialize_global()\n",
+    "\n",
+    "  # Prepare networks, loss functions, optimizers, and schedulers for pre-training\n",
+    "  first_order_network, second_order_network, criterion_1, criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2 = prepare_pre_training(hidden, factor, gelu, stepsize, gam)\n",
+    "\n",
+    "  # Conduct pre-training for both the first-order and second-order networks\n",
+    "  first_order_network_pre, second_order_network_pre, epoch_1_order, epoch_2_order , max_value_indices = pre_train(first_order_network, second_order_network, criterion_1,  criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2, factor, meta)\n",
+    "\n",
+    "  # Plot the training progress of both networks to visualize performance and learning trends\n",
+    "  pre_train_plots(epoch_1_order, epoch_2_order, f\"1st & 2nd Order Networks - Seed {i}\" , max_value_indices )\n",
+    "\n",
+    "  # Configuration step for the main training phase or evaluation\n",
+    "  testing_patterns, n_samples = get_test_patterns(factor)\n",
+    "\n",
+    "  # Function to test the model using the configured testing patterns\n",
+    "  first_order_network_pre.eval()\n",
+    "  second_order_network_pre.eval()\n",
+    "  f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, first_order_network_pre, second_order_network_pre,factor)\n",
+    "  results_seed.append(results_for_plotting)\n",
+    "  discrimination_seed.append(discrimination_performances)\n",
+    "\n",
+    "plot_testing(results_seed, discrimination_seed, seeds, \"Test Results\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7bfade3d-6385-459c-8f07-e3017264455a",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "\n",
+    "# Hyperparameters\n",
+    "global optimizer ,n_epochs , learning_rate_1\n",
+    "learning_rate_1 = 0.5\n",
+    "n_epochs = 100\n",
+    "optimizer=\"ADAMAX\"\n",
+    "hidden=40\n",
+    "factor=2\n",
+    "gelu=False\n",
+    "gam=0.98\n",
+    "meta=True\n",
+    "stepsize=25\n",
+    "initialize_global()\n",
+    "\n",
+    "\n",
+    "# Networks instantiation\n",
+    "first_order_network = FirstOrderNetwork(hidden,factor,gelu).to(device)\n",
+    "second_order_network = SecondOrderNetwork(gelu).to(device) # We define it, but won't use it until activity 3\n",
+    "\n",
+    "# Loss function\n",
+    "criterion_1 = CAE_loss\n",
+    "\n",
+    "# Optimizer\n",
+    "optimizer_1 = optim.Adamax(first_order_network.parameters(), lr=learning_rate_1)\n",
+    "\n",
+    "# Learning rate schedulers\n",
+    "scheduler_1 = StepLR(optimizer_1, step_size=stepsize, gamma=gam)\n",
+    "\n",
+    "max_values_output_first_order = []\n",
+    "max_indices_output_first_order = []\n",
+    "max_values_patterns_tensor = []\n",
+    "max_indices_patterns_tensor = []\n",
+    "\n",
+    "# Training loop\n",
+    "for epoch in range(n_epochs):\n",
+    "    # Generate training patterns and targets for each epoch.\n",
+    "    patterns_tensor, stim_present_tensor, stim_absent_tensor, order_2_tensor = generate_patterns(patterns_number, num_units,factor, 0)\n",
+    "\n",
+    "    # Forward pass through the first-order network\n",
+    "    hidden_representation , output_first_order = first_order_network(patterns_tensor)\n",
+    "\n",
+    "    output_first_order=output_first_order.requires_grad_(True)\n",
+    "\n",
+    "    # Skip computations for the second-order network\n",
+    "    with torch.no_grad():\n",
+    "\n",
+    "        # Potentially forward pass through the second-order network without tracking gradients\n",
+    "        output_second_order = second_order_network(patterns_tensor, output_first_order)\n",
+    "\n",
+    "    # Calculate the loss for the first-order network (accuracy of stimulus representation)\n",
+    "    W = first_order_network.state_dict()['fc1.weight']\n",
+    "    loss_1 = criterion_1( W, stim_present_tensor.view(-1, 100), output_first_order,\n",
+    "                        hidden_representation, lam )\n",
+    "    # Backpropagate the first-order network's loss\n",
+    "    loss_1.backward()\n",
+    "\n",
+    "    # Update first-order network weights\n",
+    "    optimizer_1.step()\n",
+    "\n",
+    "    # Reset first-order optimizer gradients to zero for the next iteration\n",
+    "\n",
+    "    # Update the first-order scheduler\n",
+    "    scheduler_1.step()\n",
+    "\n",
+    "    epoch_1_order[epoch] = loss_1.item()\n",
+    "\n",
+    "    # Get max values and indices for output_first_order\n",
+    "    max_vals_out, max_inds_out = torch.max(output_first_order[100:], dim=1)\n",
+    "    max_inds_out[max_vals_out == 0] = 0\n",
+    "    max_values_output_first_order.append(max_vals_out.tolist())\n",
+    "    max_indices_output_first_order.append(max_inds_out.tolist())\n",
+    "\n",
+    "    # Get max values and indices for patterns_tensor\n",
+    "    max_vals_pat, max_inds_pat = torch.max(patterns_tensor[100:], dim=1)\n",
+    "    max_inds_pat[max_vals_pat == 0] = 0\n",
+    "    max_values_patterns_tensor.append(max_vals_pat.tolist())\n",
+    "    max_indices_patterns_tensor.append(max_inds_pat.tolist())\n",
+    "\n",
+    "\n",
+    "max_values_indices = (max_values_output_first_order[-1],\n",
+    "            max_indices_output_first_order[-1],\n",
+    "            max_values_patterns_tensor[-1],\n",
+    "            max_indices_patterns_tensor[-1])\n",
+    "\n",
+    "\n",
+    "# Plot training loss curve\n",
+    "pre_train_plots(epoch_1_order, epoch_2_order, \"1st & 2nd Order Networks\" , max_value_indices )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa7d1ce9-da1b-4f78-b388-47bfcd50c6dd",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Testing under 3 Blindsight Conditions\n",
+    "\n",
+    "We will now use the testing auto-generated datasets from activity 1 to test the network's performance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2affe162-f4d9-495f-862a-65b0f50ca5ef",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "results_seed=[]\n",
+    "discrimination_seed=[]\n",
+    "\n",
+    "# Prepare networks for testing by calling the configuration function\n",
+    "testing_patterns, n_samples, loaded_model, loaded_model_2 = config_training(first_order_network, second_order_network, hidden, factor, gelu)\n",
+    "\n",
+    "# Perform testing using the defined function and plot the results\n",
+    "f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, loaded_model, loaded_model_2,factor)\n",
+    "\n",
+    "results_seed.append(results_for_plotting)\n",
+    "discrimination_seed.append(discrimination_performances)\n",
+    "# Assuming plot_testing is defined, call it to display results\n",
+    "plot_testing(results_seed,discrimination_seed,  1, \"Seed\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0d18302e-4657-4732-b6ef-f7439d2bb2fd",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_First_order_network\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96579a08-3c95-4dfe-9908-fabe1bb146d0",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "## Section 2: Train a Second-Order network"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caac41bc-5a93-43bf-aede-7c1e87e83fbd",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "Having previously examined the first-order network, we now switch to the second-order network, described in more detail back in Tutorial 1 (please revisit the text and video content there if you need to recap the concepts or want to refresh your understanding of the difference between these models )\n",
+    "To study this, we use a simulated dataset that mimics the conditions of blindsight. This dataset contains 400 patterns, equally split between two types:\n",
+    "\n",
+    "- **Random noise patterns** consist of low activations ranging between 0.0 and 0.02.\n",
+    "- **Designed stimulus patterns** - each pattern includes one unit that shows a higher activation level, varying between 0.0 and 1.0.\n",
+    "\n",
+    "This dataset allows us to test hypotheses concerning how sensory processing and network responses adapt under different conditions of visual impairment.\n",
+    "\n",
+    "We have three main testing scenarios, each designed to alter the signal-to-noise ratio to simulate different levels of visual impairment:\n",
+    "\n",
+    "- **Suprathreshold stimulus condition**: here, the network is tested against familiar patterns used during training to assess its response to known stimuli.\n",
+    "- **Subthreshold stimulus condition**: this condition slightly increases the noise level, akin to actual blindsight conditions, testing the network's capability to discern subtle signals.\n",
+    "- **Low vision condition**: the intensity of stimuli is decreased to evaluate how well the network performs with significantly reduced sensory input."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0b549db9-e8b0-4c49-89d2-b7324b3a4ed1",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "factor=2\n",
+    "\n",
+    "initialize_global()\n",
+    "set_1, _ = create_patterns(0,factor)\n",
+    "set_2, _ = create_patterns(1,factor)\n",
+    "set_3, _ = create_patterns(2,factor)\n",
+    "\n",
+    "# Plot\n",
+    "plot_signal_max_and_indicator(set_1.detach().cpu(), \"Suprathreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_2.detach().cpu(), \"Subthreshold dataset\")\n",
+    "plot_signal_max_and_indicator(set_3.detach().cpu(), \"Low Vision dataset\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96a91af5-c498-429d-a407-afa66d7444db",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "The first-order network model lays the groundwork for our experiments and is structured as follows:\n",
+    "\n",
+    "- Input layer: consists of 100 units representing either noise or stimulus patterns.\n",
+    "- Hidden layer: includes a 40-unit layer tasked with processing the inputs.\n",
+    "- Output layer: comprises 100 units where the responses to stimuli are recorded.\n",
+    "- Dropout and activation: includes dropout layers to prevent overfitting and a temperature-controlled activation function to fine-tune response sharpness.\n",
+    "\n",
+    "The primary aim of the first-order network is to accurately capture and react to the input patterns, setting a baseline for comparison with more complex models."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "768e074d-1a07-4f3e-8a5d-de31849e7730",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Coding Exercise 2: Developing a Second-Order Network\n",
+    "\n",
+    "Your task is to expand upon the first-order network by integrating a second-order network that incorporates a metacognitive layer assessing the predictions of the first-order network. This metacognitive layer introduces a wagering mechanism, wherein the network \"bets\" on its confidence in its predictions. \n",
+    "\n",
+    "- The first-order network is designed as an autoencoder, a type of neural network trained to reconstruct the input stimulus. The autoencoder consists of an encoder that compresses the input into a latent representation and a decoder that reconstructs the input from this representation.\n",
+    "- The second-order network, or metacognitive layer, operates by examining the difference (delta) between the original input and the output generated by the autoencoder. This difference provides insight into the reconstruction error, which is a measure of how accurately the autoencoder has learned to replicate the input data. By evaluating this reconstruction error, the second-order network can make a judgement about the certainty of the first-order network's predictions.\n",
+    "\n",
+    "These are the steps for completion:\n",
+    "\n",
+    "1. Architectural development: grasp the underlying principles of a second-order network and complete the architectural code.\n",
+    "2. Performance evaluation: visualize training losses and test the model using provided code, assessing its initial performance.\n",
+    "3. Model fine-tuning: leveraging the provided training function, experiment with fine-tuning the model to enhance its accuracy and efficiency.\n",
+    "\n",
+    "The second-order network is structured as a feedforward backpropagation network.\n",
+    "\n",
+    "- Input layer: comprises a 100-unit comparison matrix. This matrix quantifies the discrepancy between each corresponding pair of input and output units from the first-order network. For example, if an input unit and its corresponding output unit have activations of 0.6 and 0.7, respectively, the comparison unit's activation would be -0.1. This setup essentially encodes the prediction error of the first-order network's outputs as an input pattern for the second-order network.\n",
+    "- Output layer: consists of two units representing \"high\" and \"low\" wagers, indicating the network's confidence in its predictions. The initial weights for these output units range between 0.0 and 0.1.\n",
+    "- Comparator weights: set to 1.0 for connections from the first-order input layer to the comparison matrix, and -1.0 for connections from the first-order output layer. This configuration emphasizes the differential error as a critical input for the second-order decision-making process.\n",
+    "\n",
+    "The second-order network's novel approach uses the error generated by the first-order network as a direct input for making decisions—specifically, wagering on the confidence of its outputs. This methodology reflects a metacognitive layer of processing, akin to evaluating one's confidence in their answers or predictions.\n",
+    "\n",
+    "By exploring these adjustments, you can optimize the network's functionality, making it a powerful tool for understanding and simulating complex cognitive phenomena like blindsight."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2c37e357-e5e6-40b2-8507-f83161f5d85f",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "class SecondOrderNetwork(nn.Module):\n",
+    "    def __init__(self, use_gelu):\n",
+    "        super(SecondOrderNetwork, self).__init__()\n",
+    "        # Define a linear layer for comparing the difference between input and output of the first-order network\n",
+    "        self.comparison_layer = nn.Linear(100, 100)\n",
+    "\n",
+    "        # Linear layer for determining wagers, mapping from 100 features to a single output\n",
+    "        self.wager = nn.Linear(100, 1)\n",
+    "\n",
+    "        # Dropout layer to prevent overfitting by randomly setting input units to 0 with a probability of 0.5 during training\n",
+    "        self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "        # Select activation function based on the `use_gelu` flag\n",
+    "        self.activation = torch.relu\n",
+    "\n",
+    "        # Additional activation functions for potential use in network operations\n",
+    "        self.sigmoid = torch.sigmoid\n",
+    "\n",
+    "        self.softmax = nn.Softmax()\n",
+    "\n",
+    "        # Initialize the weights of the network\n",
+    "        self._init_weights()\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def _init_weights(self):\n",
+    "        # Uniformly initialize weights for the comparison and wager layers\n",
+    "        init.uniform_(self.comparison_layer.weight, -1.0, 1.0)\n",
+    "        init.uniform_(self.wager.weight, 0.0, 0.1)\n",
+    "\n",
+    "    def forward(self, first_order_input, first_order_output):\n",
+    "        ############################################################\n",
+    "        # Fill in the wager value\n",
+    "        # Applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        raise NotImplementedError(\"Student exercise\")\n",
+    "        ############################################################\n",
+    "\n",
+    "        # Calculate the difference between the first-order input and output\n",
+    "        comparison_matrix = first_order_input - first_order_output\n",
+    "\n",
+    "        #Another option is to directly calculate the per unit MSE to use as input for the comparator matrix\n",
+    "        #comparison_matrix = nn.MSELoss(reduction='none')(first_order_output, first_order_input)\n",
+    "\n",
+    "        # Pass the difference through the comparison layer and apply the chosen activation function\n",
+    "        comparison_out=self.dropout(self.activation(self.comparison_layer(comparison_matrix)))\n",
+    "\n",
+    "        # Calculate the wager value, applying dropout and sigmoid activation to the output of the wager layer\n",
+    "        wager = ...\n",
+    "\n",
+    "        return wager"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d931cb5-a87a-48be-8760-79512b9d88f7",
+   "metadata": {
+    "colab_type": "text",
+    "execution": {}
+   },
+   "source": [
+    "[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial3_Solution_a926812a.py)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "736319ec-2a17-4d80-bb04-b9507ba5db5d",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Second_Order_Network\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "947c8550-a40d-43aa-bfd6-1eb8cead339f",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# First order network instantiation\n",
+    "first_order_network = FirstOrderNetwork(hidden, factor, gelu).to(device)\n",
+    "\n",
+    "# Define the architecture, optimizers, loss functions, and schedulers for pre training\n",
+    "seeds=15\n",
+    "\n",
+    "results_seed=[]\n",
+    "discrimination_seed=[]\n",
+    "\n",
+    "# Hyperparameters\n",
+    "optimizer=\"ADAMAX\"\n",
+    "hidden=40\n",
+    "factor=2\n",
+    "gelu=False\n",
+    "gam=0.98\n",
+    "meta=True\n",
+    "stepsize=25\n",
+    "\n",
+    "for i in range(seeds):\n",
+    "  print(f\"Seed {i}\")\n",
+    "\n",
+    "  # Compare your results with the patterns generate below\n",
+    "  initialize_global()\n",
+    "\n",
+    "  # Prepare networks, loss functions, optimizers, and schedulers for pre-training\n",
+    "  first_order_network, second_order_network, criterion_1, criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2 = prepare_pre_training(hidden, factor, gelu, stepsize, gam)\n",
+    "\n",
+    "  # Conduct pre-training for both the first-order and second-order networks\n",
+    "  first_order_network_pre, second_order_network_pre, epoch_1_order, epoch_2_order , max_value_indices = pre_train(first_order_network, second_order_network, criterion_1,  criterion_2, optimizer_1, optimizer_2, scheduler_1, scheduler_2, factor, meta)\n",
+    "\n",
+    "  # Plot the training progress of both networks to visualize performance and learning trends\n",
+    "  pre_train_plots(epoch_1_order, epoch_2_order, f\"1st & 2nd Order Networks - Seed {i}\" , max_value_indices )\n",
+    "\n",
+    "  # Configuration step for the main training phase or evaluation\n",
+    "  testing_patterns, n_samples = get_test_patterns(factor)\n",
+    "\n",
+    "  # Function to test the model using the configured testing patterns\n",
+    "  first_order_network_pre.eval()\n",
+    "  second_order_network_pre.eval()\n",
+    "  f1_scores_wager, mse_losses_indices , mse_losses_values , discrimination_performances, results_for_plotting = testing(testing_patterns, n_samples, first_order_network_pre, second_order_network_pre,factor)\n",
+    "  results_seed.append(results_for_plotting)\n",
+    "  discrimination_seed.append(discrimination_performances)\n",
+    "\n",
+    "plot_testing(results_seed, discrimination_seed, seeds, \"Test Results\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2047ee8a-4ebc-41dc-a77a-4e17f7c74947",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Discussion point\n",
+    "\n",
+    "Let's dive into the outcomes!\n",
+    "\n",
+    "- Did you notice any variations between the two models?\n",
+    "- Can you explain how these differences influenced the performance?\n",
+    "- What role does a second-order network play, and in which situations would it be more effective?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "55115815-beb2-4f19-a598-9b129ff87637",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Discussion_Point_Second_Order_Network\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a5a880a9-a069-4e0f-a481-f3b85b6a3952",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Video 1: Second Order Network\n",
+    "\n",
+    "from ipywidgets import widgets\n",
+    "from IPython.display import YouTubeVideo\n",
+    "from IPython.display import IFrame\n",
+    "from IPython.display import display\n",
+    "\n",
+    "class PlayVideo(IFrame):\n",
+    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
+    "    self.id = id\n",
+    "    if source == 'Bilibili':\n",
+    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
+    "    elif source == 'Osf':\n",
+    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
+    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
+    "\n",
+    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
+    "  tab_contents = []\n",
+    "  for i, video_id in enumerate(video_ids):\n",
+    "    out = widgets.Output()\n",
+    "    with out:\n",
+    "      if video_ids[i][0] == 'Youtube':\n",
+    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
+    "                             height=H, fs=fs, rel=0)\n",
+    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
+    "      else:\n",
+    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
+    "                          height=H, fs=fs, autoplay=False)\n",
+    "        if video_ids[i][0] == 'Bilibili':\n",
+    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
+    "        elif video_ids[i][0] == 'Osf':\n",
+    "          print(f'Video available at https://osf.io/{video.id}')\n",
+    "      display(video)\n",
+    "    tab_contents.append(out)\n",
+    "  return tab_contents\n",
+    "\n",
+    "video_ids = [('Youtube', 'lHRP14mxXv8'), ('Bilibili', 'BV1jM4m1S7ek')]\n",
+    "tab_contents = display_videos(video_ids, W=854, H=480)\n",
+    "tabs = widgets.Tab()\n",
+    "tabs.children = tab_contents\n",
+    "for i in range(len(tab_contents)):\n",
+    "  tabs.set_title(i, video_ids[i][0])\n",
+    "display(tabs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8a54d67b-507e-4a8a-9715-0aacdeb06f26",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Video_1\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a694f1e-3f32-48fc-bce8-0b544d43ca62",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Coding Exercise 3: Plot Surfaces for Content / Awareness Inference\n",
+    "\n",
+    "To explore the properties of the HOSS model, we can simulate inference at different levels of the hierarchy over the full 2D space of possible input X's. The left panel below shows that the probability of awareness (of any stimulus contents) rises in a graded manner from the lower left corner of the graph (low activation of any feature) to the upper right (high activation of both features). In contrast, the right panel shows that confidence in making a discrimination response (e.g. rightward vs. leftward) increases away from the major diagonal, as the model becomes sure that the sample was generated by either a leftward or rightward tilted stimulus.\n",
+    "\n",
+    "Together, the two surfaces make predictions about the relationships we might see between discrimination confidence and awareness in a simple psychophysics experiment. One notable prediction is that discrimination could still be possible - and lead to some degree of confidence - even when the higher-order node is \"reporting\" unawareness of the stimulus.\n",
+    "\n",
+    "Now, let's get hands on and plot those auto-generated patterns!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "77fbfe70",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "def HOSS_evaluate(X, mu, Sigma, Aprior, Wprior):\n",
+    "    \"\"\"\n",
+    "    Inference on 2D Bayes net for asymmetric inference on presence vs. absence.\n",
+    "    \"\"\"\n",
+    "\n",
+    "    # Initialise variables and conditional prob tables\n",
+    "    p_A = np.array([1 - Aprior, Aprior])  # prior on awareness state A\n",
+    "    p_W_a1 = np.append(0, Wprior)  # likelihood of world states W given aware, first entry is absence\n",
+    "    p_W_a0 = np.append(1, np.zeros(len(Wprior)))  # likelihood of world states W given unaware, first entry is absence\n",
+    "    p_W = (p_W_a1 + p_W_a0) / 2  # prior on W marginalising over A (for KL)\n",
+    "\n",
+    "    # Compute likelihood of observed X for each possible W (P(X|mu_w, Sigma))\n",
+    "    lik_X_W = np.array([multivariate_normal.pdf(X, mean=mu_i, cov=Sigma) for mu_i in mu])\n",
+    "    p_X_W = lik_X_W / lik_X_W.sum()  # normalise to get P(X|W)\n",
+    "\n",
+    "    # Combine with likelihood of each world state w given awareness state A\n",
+    "    lik_W_A = np.vstack((p_X_W * p_W_a0 * p_A[0], p_X_W * p_W_a1 * p_A[1]))\n",
+    "    post_A = lik_W_A.sum(axis=1)  # sum over W\n",
+    "    post_A = post_A / post_A.sum()  # normalise\n",
+    "\n",
+    "    # Posterior over W (P(W|X=x) marginalising over A)\n",
+    "    post_W = lik_W_A.sum(axis=0)  # sum over A\n",
+    "    post_W = post_W / post_W.sum()  # normalise\n",
+    "\n",
+    "    # KL divergences\n",
+    "    KL_W = (post_W * np.log(post_W / p_W)).sum()\n",
+    "    KL_A = (post_A * np.log(post_A / p_A)).sum()\n",
+    "\n",
+    "    return post_W, post_A, KL_W, KL_A"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "31503073-a7c0-4502-8d94-5ffa47a22926",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the grid\n",
+    "xgrid = np.arange(0, 2.01, 0.01)\n",
+    "\n",
+    "# Define the means for the Gaussian distributions\n",
+    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
+    "\n",
+    "# Define the covariance matrix\n",
+    "Sigma = np.array([[1, 0], [0, 1]])\n",
+    "\n",
+    "# Prior probabilities\n",
+    "Wprior = np.array([0.5, 0.5])\n",
+    "Aprior = 0.5\n",
+    "\n",
+    "# Initialize arrays to hold confidence and posterior probability\n",
+    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
+    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
+    "\n",
+    "# Compute confidence and posterior probability for each point in the grid\n",
+    "for i, xi in tqdm(enumerate(xgrid), total=len(xgrid), desc='Outer Loop'):\n",
+    "    for j, xj in enumerate(xgrid):\n",
+    "        X = [xi, xj]\n",
+    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
+    "        confW[i, j] = max(post_w[1], post_w[2])\n",
+    "        posteriorAware[i, j] = post_A[1]\n",
+    "\n",
+    "with plt.xkcd():\n",
+    "\n",
+    "    # Plotting\n",
+    "    plt.figure(figsize=(10, 5))\n",
+    "\n",
+    "    # Posterior probability \"seen\"\n",
+    "    plt.subplot(1, 2, 1)\n",
+    "    plt.contourf(xgrid, xgrid, posteriorAware.T)\n",
+    "    plt.colorbar()\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('Posterior probability \"seen\"')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    # Confidence in identity\n",
+    "    plt.subplot(1, 2, 2)\n",
+    "    contour_set = plt.contourf(xgrid, xgrid, confW.T)\n",
+    "    plt.colorbar()\n",
+    "    plt.contour(xgrid, xgrid, posteriorAware.T, levels=[0.5], linewidths=4, colors=['white'])  # Line contour for threshold\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('Confidence in identity')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d129657-62aa-42d1-970a-93fd67736b69",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Simulate KL-divergence surfaces\n",
+    "\n",
+    "We can also simulate KL-divergences (a measure of Bayesian surprise) at each layer in the network, which under predictive coding models of the brain, has been proposed to scale with neural activation (e.g., Friston, 2005; Summerfield & de Lange, 2014)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "66044263-c8de-49a9-a56b-2e7336cc737c",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# Define the grid\n",
+    "xgrid = np.arange(0, 2.01, 0.01)\n",
+    "\n",
+    "# Define the means for the Gaussian distributions\n",
+    "mu = np.array([[0.5, 0.5], [0.5, 1.5], [1.5, 0.5]])\n",
+    "\n",
+    "# Define the covariance matrix\n",
+    "Sigma = np.array([[1, 0], [0, 1]])\n",
+    "\n",
+    "# Prior probabilities\n",
+    "Wprior = np.array([0.5, 0.5])\n",
+    "Aprior = 0.5\n",
+    "\n",
+    "# Initialize arrays to hold confidence and posterior probability\n",
+    "confW = np.zeros((len(xgrid), len(xgrid)))\n",
+    "posteriorAware = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_w = np.zeros((len(xgrid), len(xgrid)))\n",
+    "KL_A = np.zeros((len(xgrid), len(xgrid)))\n",
+    "\n",
+    "# Compute confidence and posterior probability for each point in the grid\n",
+    "for i, xi in enumerate(xgrid):\n",
+    "    for j, xj in enumerate(xgrid):\n",
+    "        X = [xi, xj]\n",
+    "        post_w, post_A, KL_w[i, j], KL_A[i, j] = HOSS_evaluate(X, mu, Sigma, Aprior, Wprior)\n",
+    "\n",
+    "        confW[i, j] = max(post_w[1], post_w[2])\n",
+    "        posteriorAware[i, j] = post_A[1]\n",
+    "\n",
+    "# Calculate the mean K-L divergence for absent and present awareness states\n",
+    "KL_A_absent = np.mean(KL_A[posteriorAware < 0.5])\n",
+    "KL_A_present = np.mean(KL_A[posteriorAware >= 0.5])\n",
+    "KL_w_absent = np.mean(KL_w[posteriorAware < 0.5])\n",
+    "KL_w_present = np.mean(KL_w[posteriorAware >= 0.5])\n",
+    "\n",
+    "with plt.xkcd():\n",
+    "\n",
+    "    # Plotting\n",
+    "    plt.figure(figsize=(18, 6))\n",
+    "\n",
+    "    # K-L divergence, perceptual states\n",
+    "    plt.subplot(1, 2, 1)\n",
+    "    plt.contourf(xgrid, xgrid, KL_w.T, cmap='viridis')\n",
+    "    plt.colorbar()\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('KL-divergence, perceptual states')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    # K-L divergence, awareness state\n",
+    "    plt.subplot(1, 2, 2)\n",
+    "    plt.contourf(xgrid, xgrid, KL_A.T, cmap='viridis')\n",
+    "    plt.colorbar()\n",
+    "    plt.xlabel('X1')\n",
+    "    plt.ylabel('X2')\n",
+    "    plt.title('KL-divergence, awareness state')\n",
+    "    plt.axis('square')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b32e4908-0f6f-4259-832f-045adcb19700",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "### Discussion point\n",
+    "\n",
+    "Can you recognise the difference between the KL divergence for the W-level and the one for the A-level?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d8deb66-9a1d-49e1-a3ef-96970efa8d97",
+   "metadata": {
+    "colab_type": "text",
+    "execution": {}
+   },
+   "source": [
+    "[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W2D5_Mysteries/solutions/W2D5_Tutorial3_Solution_f903bbb4.py)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "869fc8f1-4199-4525-80b3-26e74babc66a",
+   "metadata": {
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "with plt.xkcd():\n",
+    "\n",
+    "    # Create figure with specified size\n",
+    "    plt.figure(figsize=(10, 5))\n",
+    "\n",
+    "    # KL divergence for W states\n",
+    "    plt.subplot(1, 2, 1)\n",
+    "    plt.bar(['unseen', 'seen'], [KL_w_absent, KL_w_present], color='k')\n",
+    "    plt.ylabel('KL divergence, W states')\n",
+    "    plt.xticks(fontsize=18)\n",
+    "    plt.yticks(fontsize=18)\n",
+    "\n",
+    "    # KL divergence for A states\n",
+    "    plt.subplot(1, 2, 2)\n",
+    "    plt.bar(['unseen', 'seen'], [KL_A_absent, KL_A_present], color='k')\n",
+    "    plt.ylabel('KL divergence, A states')\n",
+    "    plt.xticks(fontsize=18)\n",
+    "    plt.yticks(fontsize=18)\n",
+    "\n",
+    "    plt.tight_layout()\n",
+    "\n",
+    "    # Show plot\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "64ecb92c-bfe3-4e49-bd40-f11ffa685ece",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_HOSS_Bonus_Content\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcd87344-d473-44af-a881-b68e5471d353",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "---\n",
+    "# Discussion\n",
+    "This section contains an extra discussion exercise if you have time and inclination."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca33829c-8d54-437e-ba33-d3003af51d7a",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "In this bonus section, Megan and Anil will delve into the complexities of defining and testing for consciousness, particularly in the context of artificial intelligence. We will explore various theoretical perspectives, examine classic and contemporary tests for consciousness, and discuss the challenges and ethical implications of determining whether a system truly possesses conscious experience."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "21b621ea-1639-4131-8ec3-9cdf34a64f77",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Video 2: Consciousness Bonus Content\n",
+    "\n",
+    "from ipywidgets import widgets\n",
+    "from IPython.display import YouTubeVideo\n",
+    "from IPython.display import IFrame\n",
+    "from IPython.display import display\n",
+    "\n",
+    "class PlayVideo(IFrame):\n",
+    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
+    "    self.id = id\n",
+    "    if source == 'Bilibili':\n",
+    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
+    "    elif source == 'Osf':\n",
+    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
+    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
+    "\n",
+    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
+    "  tab_contents = []\n",
+    "  for i, video_id in enumerate(video_ids):\n",
+    "    out = widgets.Output()\n",
+    "    with out:\n",
+    "      if video_ids[i][0] == 'Youtube':\n",
+    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
+    "                             height=H, fs=fs, rel=0)\n",
+    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
+    "      else:\n",
+    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
+    "                          height=H, fs=fs, autoplay=False)\n",
+    "        if video_ids[i][0] == 'Bilibili':\n",
+    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
+    "        elif video_ids[i][0] == 'Osf':\n",
+    "          print(f'Video available at https://osf.io/{video.id}')\n",
+    "      display(video)\n",
+    "    tab_contents.append(out)\n",
+    "  return tab_contents\n",
+    "\n",
+    "video_ids = [('Youtube', '00dL8q7WgcU'), ('Bilibili', 'BV12n4y1Q7C2')]\n",
+    "tab_contents = display_videos(video_ids, W=854, H=480)\n",
+    "tabs = widgets.Tab()\n",
+    "tabs.children = tab_contents\n",
+    "for i in range(len(tab_contents)):\n",
+    "  tabs.set_title(i, video_ids[i][0])\n",
+    "display(tabs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "39c202fb-f580-4a96-8f8e-bad24ed1d55c",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Video_2\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9e839f2-e237-4ed4-9045-56dc7b5f6d60",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "## Discussion activity: Is it actually conscious?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2720c0b5-6386-43a6-9647-f1245531c376",
+   "metadata": {
+    "execution": {}
+   },
+   "source": [
+    "We discussed the difference between these two...\n",
+    "- \"Forward\" tests: passing means the machine is conscious (or intelligent).\n",
+    "- \"Reverse\" tests: passing means humans are convinced that a machine is conscious (or intelligent).\n",
+    "\n",
+    "**Discuss!** If a system (AI, other animal, other human) exhibited all the \"right signs\" of being conscious, how can we know for sure it is actually conscious? How could you design a test to be a true forward test?\n",
+    "\n",
+    "- Room 1: I think you could design a forward test in this way... [share your ideas]\n",
+    "- Room 2: I think a forward test is impossible, and here's why [share your ideas]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "84958157-c165-4cc3-be76-408999cf44ad",
+   "metadata": {
+    "cellView": "form",
+    "execution": {}
+   },
+   "outputs": [],
+   "source": [
+    "# @title Submit your feedback\n",
+    "content_review(f\"{feedback_prefix}_Discussion_activity\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "collapsed_sections": [],
+   "include_colab_link": true,
+   "name": "W2D5_Tutorial3",
+   "provenance": [],
+   "toc_visible": true
+  },
+  "kernel": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.22"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}