|
6 | 6 |
|
7 | 7 | <link rel="stylesheet" href="./style.css">
|
8 | 8 | </script>
|
9 |
| - <script type="text/markdown"> |
10 |
| - Data loaders are special cells that run at build time via an interpreter rather than in the browser at runtime. Data loader cells are useful for preparing static data, ensuring consistency and stability, and greatly improving runtime performance. Think of data loaders as a generalization of [database connectors](./databases) that allow languages besides SQL. |
| 9 | + <script id="2" type="text/markdown"> |
| 10 | + **Data loaders** are special cells that run "ahead" at build time via an interpreter, rather than "live" when you view a notebook in the browser. Data loaders are useful for preparing static data, ensuring consistency and stability, and improving performance. Think of data loaders as a generalization of [database connectors](./databases) that allow languages besides SQL. |
11 | 11 | </script>
|
12 |
| - <script type="text/markdown"> |
13 |
| - For example, here is a Node.js data loader that says hello and reports the current version of Node.js: |
| 12 | + <script id="7" type="text/markdown"> |
| 13 | + Notebooks currently support Node.js and Python data loaders. We will likely add additional interpreters in the future. |
14 | 14 | </script>
|
15 |
| - <script type="application/vnd.node.javascript" format="text" pinned=""> |
16 |
| - process.stdout.write(`Hello from Node ${process.version}!`); |
| 15 | + <script id="3" type="text/markdown"> |
| 16 | + As an example, here is a trivial Python cell that says hello, and reports the current version of Python: |
17 | 17 | </script>
|
18 |
| - <script type="text/markdown"> |
| 18 | + <script id="10" type="text/x-python" pinned="" output="hello" format="text"> |
| 19 | + import platform |
| 20 | + |
| 21 | + print(f"Hello from Python {platform.python_version()}!", end="") |
| 22 | + </script> |
| 23 | + <script id="22" type="text/markdown"> |
| 24 | + The Python cell above uses the text **format**, and hence its value is displayed as a string. The **output** is given the name `hello`, allowing it to be referenced in JavaScript: |
| 25 | + </script> |
| 26 | + <script id="23" type="module" pinned=""> |
| 27 | + hello.toUpperCase() |
| 28 | + </script> |
| 29 | + <script id="24" type="text/markdown"> |
| 30 | + A variety of formats are supported, including these text-based formats: |
| 31 | + |
| 32 | + - `text` - a string |
| 33 | + - `json` - JSON |
| 34 | + - `csv` - comma-separated values |
| 35 | + - `tsv` - tab-separated values |
| 36 | + - `xml` - XML |
| 37 | + |
| 38 | + And these binary formats: |
| 39 | + |
| 40 | + - `arrow` - Apache Arrow IPC |
| 41 | + - `parquet` - Apache Parquet |
| 42 | + - `blob` - binary data as a `Blob` |
| 43 | + - `buffer` - binary data as an `ArrayBuffer` |
| 44 | + |
| 45 | + You can also generate images in `jpeg`, `gif`, `webp`, `png`, and `svg` format. And you can server-side render HTML content using the `html` format. |
| 46 | + </script> |
| 47 | + <script id="21" type="text/markdown"> |
| 48 | + As a more realistic example, below is a Node.js data loader cell that fetches download statistics for Observable Plot from npm. |
| 49 | + </script> |
| 50 | + <script id="4" type="application/vnd.node.javascript" pinned="" output="downloads" format="json"> |
| 51 | + async function getNpmDownloads( |
| 52 | + name, // name of package |
| 53 | + { |
| 54 | + end: max, // exclusive |
| 55 | + start: min // inclusive |
| 56 | + } |
| 57 | + ) { |
| 58 | + const data = []; |
| 59 | + for (let start = max, end; start > min; ) { |
| 60 | + end = start; |
| 61 | + start = addDate(start, -365); // fetch a year at a time |
| 62 | + if (start < min) start = min; |
| 63 | + const response = await fetch( |
| 64 | + `https://api.npmjs.org/downloads/range/${formatDate(start)}:${formatDate(addDate(end, -1))}${name ? `/${encodeURIComponent(name)}` : ``}` |
| 65 | + ); |
| 66 | + if (!response.ok) throw new Error(`fetch failed: ${response.status}`); |
| 67 | + const {downloads} = await response.json(); |
| 68 | + for (const {downloads: value, day: date} of downloads.reverse()) { |
| 69 | + data.push({date: new Date(date), value}); |
| 70 | + } |
| 71 | + } |
| 72 | + for (let i = data.length - 1; i >= 0; --i) { |
| 73 | + if (data[i].value > 0) { |
| 74 | + return data.slice(data[0].value > 0 ? 0 : 1, i + 1); // ignore npm reporting zero for today |
| 75 | + } |
| 76 | + } |
| 77 | + throw new Error("no data found"); |
| 78 | + } |
| 79 | + |
| 80 | + function formatDate(date) { |
| 81 | + return date.toISOString().slice(0, 10); |
| 82 | + } |
| 83 | + |
| 84 | + function addDate(date, n) { |
| 85 | + date = new Date(+date); |
| 86 | + date.setDate(date.getDate() + n); |
| 87 | + return date; |
| 88 | + } |
| 89 | + |
| 90 | + process.stdout.write( |
| 91 | + JSON.stringify( |
| 92 | + await getNpmDownloads("@observablehq/plot", { |
| 93 | + start: new Date("2022-09-01"), |
| 94 | + end: new Date("2025-09-01") |
| 95 | + }) |
| 96 | + ) |
| 97 | + ); |
| 98 | + </script> |
| 99 | + <script id="6" type="text/markdown"> |
19 | 100 | <aside>The cell above is JavaScript that runs in Node.js, unlike normal JavaScript cells that run in the browser.</aside>
|
20 | 101 |
|
21 |
| - The output of a data loader cell is automatically saved to a `.observable/cache` directory on your local file system alongside your notebooks. |
| 102 | + The output of a data loader cell is automatically saved to a `.observable/cache` directory on your local file system alongside your notebooks. Data snapshots are stable --- the data only updates if you re-run the data loader cell. In Observable Desktop, you can re-run a data loader cell by clicking the **Play** button, by hitting <span style="font-family: var(--sans-serif);">**shift-return**</span>, or by clicking on the query age in the cell toolbar. In Notebook Kit, delete the corresponding file from the `.observable/cache` directory; you can also use continuous deployment, such as GitHub Actions, to refresh data automatically. |
22 | 103 | </script>
|
23 |
| - <script type="text/markdown"> |
24 |
| - In Observable Desktop, you can re-run a data loader cell by clicking the **Play** button, by hitting <span style="font-family: var(--sans-serif);">**shift-return**</span>, or by clicking on the query age in the cell toolbar. In Notebook Kit, delete the corresponding file from the `.observable/cache` directory; you can also use continuous deployment, such as GitHub Actions, to refresh data automatically. |
| 104 | + <script id="27" type="text/markdown"> |
| 105 | + The Node.js cell above defines the `downloads` variable, which we use below to render an area chart with Observable Plot: |
25 | 106 | </script>
|
26 |
| - <script type="text/markdown"> |
27 |
| - Currently, Notebook Kit only supports the Node.js interpreter for data loader cells, but we plan on adding other interpreters, notably Python and R. |
| 107 | + <script id="26" type="module" pinned=""> |
| 108 | + Plot.plot({ |
| 109 | + width, |
| 110 | + x: {type: "utc"}, |
| 111 | + y: {grid: true, label: "downloads"}, |
| 112 | + marks: [ |
| 113 | + Plot.axisY({label: "Downloads per day"}), |
| 114 | + Plot.areaY(downloads, {x: "date", y: "value", curve: "step"}), |
| 115 | + Plot.tip(downloads, Plot.pointerX({x: "date", y: "value", tip: true})) |
| 116 | + ] |
| 117 | + }) |
28 | 118 | </script>
|
29 |
| - <script type="text/markdown"> |
30 |
| - To improve security, the Node.js interpreter uses [process-based permissions](https://nodejs.org/api/permissions.html). Node.js cells are only allowed to read files in the same directory as the notebook, with no other permissions. (We may offer a way to relax permissions in the future, but want to encourage safety; let us know if you run into issues.) |
| 119 | + <script id="28" type="text/markdown"> |
| 120 | + Here's a bit more about data loaders. |
| 121 | + </script> |
| 122 | + <script id="17" type="text/markdown"> |
| 123 | + ## Node.js data loaders |
| 124 | + |
| 125 | + Node.js data loaders require Node.js 22.12+ to be installed in one of the following locations: |
| 126 | + |
| 127 | + - `/opt/homebrew/bin/node` (Homebrew) |
| 128 | + - `/opt/local/bin/node` (MacPorts) |
| 129 | + - `/usr/local/bin/node` (official Node.js installer) |
| 130 | + - `/usr/bin/node` (operating system) |
31 | 131 | </script>
|
32 |
| - <script type="text/markdown"> |
| 132 | + <script id="8" type="text/markdown"> |
| 133 | + To improve security, the Node.js interpreter uses [process-based permissions](https://nodejs.org/api/permissions.html): Node.js cells are only allowed to read files in the same directory as the notebook, with no other permissions. (We may offer a way to relax permissions in the future, but want to encourage safety; let us know if you run into issues.) |
| 134 | + </script> |
| 135 | + <script id="9" type="text/markdown"> |
33 | 136 | Due to the above security restrictions, if you wish to import installed packages, they must be installed within the same directory as the notebook (_e.g._, if your notebook is in `docs`, packages must be installed in `docs/node_modules` with a `docs/package.json`).
|
34 | 137 | </script>
|
| 138 | + <script id="18" type="text/markdown"> |
| 139 | + ## Python data loaders |
| 140 | + |
| 141 | + Python data loaders require Python 3.12+ to be installed in one of the following locations: |
| 142 | + |
| 143 | + - `.venv/bin/python3` (venv) |
| 144 | + - `/opt/homebrew/bin/python3` (Homebrew) |
| 145 | + - `/opt/local/bin/python3` (MacPorts) |
| 146 | + - `/usr/local/bin/python3` (official Python installer) |
| 147 | + - `/usr/bin/python3` (operating system) |
| 148 | + </script> |
| 149 | + <script id="30" type="text/markdown"> |
| 150 | + If you have a virtual environment (`.venv`) in the same directory as the notebook, it will automatically be used. However, packages are not installed implicitly; you must install packages yourself, typically using `pip`. (And we recommend using `pip freeze` to create a `requirements.txt`.) |
| 151 | + </script> |
35 | 152 | </notebook>
|
0 commit comments