feat(01_sampling): added spawn + better introduction

TomMonks · TomMonks · commit f0fcf4830bd1 · 2024-12-27T16:30:44.000Z
diff --git a/content/01_sampling.ipynb b/content/01_sampling.ipynb
@@ -9,7 +9,12 @@
     "\n",
     "If you are working in simulation modelling in Python, you will likely need to use `numpy.random` namespace. It provides a variety of statistical distributions which you can use for efficient sampling. \n",
     "\n",
-    "This notebook will guide you through an example of generating **100,000 samples** from the **uniform**, **exponential** and **normal** distributions."
+    "This notebook will guide you through examples of \n",
+    "\n",
+    "1.  Creating instances of a high quality Pseudo Random Number Generator (PRNG) using PCG64 provided by `numpy`\n",
+    "2.  Generating samples from the **uniform**, **exponential** and **normal** distributions.\n",
+    "3.  Spawning multiple non-overlapping streams of random numbers\n",
+    "4.  Using OOP to encapsulate PRNGs, distributions and parameters for simulation models."
    ]
   },
   {
@@ -71,7 +76,7 @@
     "    '''\n",
     "    hist = np.histogram(samples, bins=np.arange(bins), \n",
     "                        density=True)\n",
-    "    \n",
+    "\n",
     "    fig = plt.figure(figsize=figsize)\n",
     "    ax = fig.add_subplot()\n",
     "    _ = ax.plot(hist[0])\n",
@@ -88,9 +93,13 @@
    "source": [
     "## 3. Creating a random number generator object\n",
     "\n",
-    "To generate random numbers for sampling from each distribution, we can use the `default_rng()` function from the `numpy.random` module.\n",
+    "To generate pseudo random numbers for sampling from each distribution, we can use the `default_rng()` function from the `numpy.random` module.\n",
     "\n",
-    "This function constructs an instance of a `Generator` class, which can produce random numbers. For more information on `Generator` you can look at [`numpy` online documentation.](https://numpy.org/doc/stable/reference/random/generator.html)"
+    "This function constructs an instance of a `Generator` class, which can produce random numbers. \n",
+    "\n",
+    "By default `numpy` uses a Pseudo-Random Number Generator (PRNG) called use of the [Permuted Congruential Generator 64-bit](https://www.pcg-random.org/) (PCG64; period = $2^{128}$; maximum number of streams = $2^{63}$).\n",
+    "\n",
+    "For more information on `Generator` you can look at [`numpy` online documentation.](https://numpy.org/doc/stable/reference/random/generator.html)"
    ]
   },
   {
@@ -147,7 +156,7 @@
     "samples = rng.uniform(low=10, high=40, size=1_000_000)\n",
     "\n",
     "# Illustrate with plot.\n",
-    "fig, ax = distribution_plot(samples, bins=50)"
+    "_ = distribution_plot(samples, bins=50)"
    ]
   },
   {
@@ -165,9 +174,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "rng = np.random.default_rng()\n",
+    "rng = np.random.default_rng(42)\n",
     "samples = rng.exponential(scale=12, size=1_000_000)\n",
-    "distribution_plot(samples, bins=50)"
+    "_ = distribution_plot(samples, bins=50)"
    ]
   },
   {
@@ -185,9 +194,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "rng = np.random.default_rng()\n",
+    "rng = np.random.default_rng(42)\n",
     "samples = rng.normal(loc=25.0, scale=5.0, size=1_000_000)\n",
-    "distribution_plot(samples, bins=50)"
+    "_ = distribution_plot(samples, bins=50)"
    ]
   },
   {
@@ -218,7 +227,7 @@
    "id": "6ed34200-c5ac-4a70-ae83-90adf48aee67",
    "metadata": {},
    "source": [
-    "Note that you can also set `size` to 1.  Just be aware that an array is returned. e.g."
+    "**Note** that you can also set `size` to 1.  Just be aware that an array is returned. e.g."
    ]
   },
   {
@@ -238,6 +247,62 @@
     "print(sample[0])"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "b4bbf593-fa4a-4347-acb8-2f3df370d6ef",
+   "metadata": {},
+   "source": [
+    "## 5. Spawning multiple non-overlapping PRN streams.\n",
+    "\n",
+    "For simulation we ideally want to use multiple streams of random numbers that do not overlap (i.e. they are independent). This is straightforward to implement in Python using `SeedSequence` and a user provided integer seed and the number of independent streams to spawn.\n",
+    "\n",
+    "> As a user we don't need to worry about the quality of the integer seed provided. This is useful for implementing multiple replications and common random numbers."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82a41ef0-7198-403d-9fc9-a73eb293655a",
+   "metadata": {},
+   "source": [
+    "Here's how we create the seeds from a single user supplied seed.  The returned variable `seeds` is a Python `List`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4c404fa0-0987-4bff-9a1d-e784c485e59f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n_streams = 2\n",
+    "user_seed = 1\n",
+    "\n",
+    "seed_sequence = np.random.SeedSequence(user_seed)\n",
+    "seeds = seed_sequence.spawn(n_streams)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "002292d7-7bd8-4fec-9eb0-b9a2f6459b4f",
+   "metadata": {},
+   "source": [
+    "We use `seeds` when creating our PRNGs.  For example, one for inter-arrival times and one for service times."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "268d5a1b-0372-4d4e-b38d-75b3b9b8990e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# e.g. to model arrival times\n",
+    "arrival_rng = np.random.default_rng(seeds[0])\n",
+    "\n",
+    "# e.g. to model service times\n",
+    "service_rng = np.random.default_rng(seeds[1])"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "0dc23b86-1ae1-4da0-b393-97dcf884f442",
@@ -273,7 +338,7 @@
     "        mean: float\n",
     "            The mean of the exponential distribution\n",
     "\n",
-    "        random_seed: int, optional (default=None)\n",
+    "        random_seed: int | SeedSequence, optional (default=None)\n",
     "            A random seed to reproduce samples.  If set to none then a unique\n",
     "            sample is created.\n",
     "        '''\n",