Hi! As part of the OOD Appverse community, we're working to improve documentation consistency across Open OnDemand apps so that deployers at other sites can more easily evaluate, install, and adapt them.
We've put together a README template that covers the key sections deployers typically need when considering an app for their site.
After reviewing your current README, here's what we found:
Sections to add (not currently in your README):
- Screenshots
- Features
- Configuration (
form.yml attributes table)
- Troubleshooting
- Testing
- Known Limitations
- References
- Acknowledgments
Sections that could be expanded:
- Overview -- could mention app type (Batch Connect
basic template), Spark cluster support, JupyterLab/Notebook toggle, and link to the upstream Jupyter and Apache Spark projects
- Install -- could use the latest release tag (
v0.16.0), add form.yml configuration guidance for deployers at other sites, and include site-specific customization steps
- Prerequisites -- could add Open OnDemand version and optional dependencies
Sections already present:
- Prerequisites (compute node software) -- well documented with Lmod, Jupyter Notebook, OpenSSL, and Apache Spark versions
- Contributing -- standard fork-and-PR workflow
- License -- dual-license (MIT for code, CC-BY-4.0 for docs) is clearly stated
Below we've provided two versions: a diff showing exactly what we're suggesting to add or change, and a clean copy-paste version you can drop in directly. Lines marked with <!-- TODO --> need your input -- we deliberately left those rather than guessing.
Diff view -- see exactly what's new and changed
# Batch Connect - OSC Jupyter Notebook + Spark

[](https://opensource.org/licenses/MIT)
- An interactive app designed for OSC OnDemand that launches a Jupyter Notebook
- server and an Apache Spark cluster within an Owens batch job.
+ ## Overview
+
+ An [Open OnDemand](https://openondemand.org/) Batch Connect app that launches
+ a [Jupyter](https://jupyter.org/) Notebook (or Lab) server together with an
+ [Apache Spark](https://spark.apache.org/) cluster as an interactive session on
+ OSC HPC clusters. The app creates a PySpark kernel so users can run Spark jobs
+ directly from Jupyter notebooks.
+
+ This app uses the Batch Connect `basic` template with Slurm and supports
+ clusters: Pitzer, Cardinal, and Ascend.
+
+ - **Upstream projects:** [Jupyter](https://jupyter.org/), [Apache Spark](https://spark.apache.org/)
+ - **Batch Connect template:** `basic`
+ - **Scheduler:** Slurm
+
+ ## Screenshots
+
+ <!-- TODO: Add a screenshot of the app's launch form or a running session -->
+
+ ## Features
+
+ - Launches a Jupyter server with a built-in PySpark kernel connected to a Spark cluster
+ - Toggle between JupyterLab and Jupyter Notebook via checkbox
+ - Multi-cluster support (Pitzer, Cardinal, Ascend)
+ - Multi-node Spark clusters with configurable worker count per node
+ - Hugemem node type support for memory-intensive workloads
+ - Custom Spark configuration file support (override defaults)
+ - Supplementary environment variables file support
+ - Option to restrict driver process to master node only (for large `.collect`/`.take` operations)
+ - Optional OSC tutorial/workshop notebooks
+ - Configurable root directory for the Jupyter session
+ - Module-based software loading via Lmod (`project/ondemand`, `app_jupyter/`, Spark, Python)
- ## Prerequisites
+ ## Requirements
+
+ ### Compute Node Software
This Batch Connect app requires the following software be installed on the
**compute nodes** that the batch job is intended to run on (**NOT** the
OnDemand node):
- - [Lmod] 6.0.1+ or any other `module purge` and `module load <modules>` based
- CLI used to load appropriate environments within the batch job before
- launching the Jupyter Notebook server.
- - [Jupyter Notebook] 4.2.3+ (earlier versions are untested but may work for
- you)
- - [OpenSSL] 1.0.1+ (used to hash the Jupyter Notebook server password)
- - [Apache Spark] 2.1.0+
-
- [Apache Spark]: https://spark.apache.org/
- [Jupyter Notebook]: https://jupyter.org/
- [OpenSSL]: https://www.openssl.org/
- [Lmod]: https://www.tacc.utexas.edu/research-development/tacc-projects/lmod
+ - [Lmod](https://www.tacc.utexas.edu/research-development/tacc-projects/lmod)
+ 6.0.1+ or any other `module purge` and `module load <modules>` based CLI
+ - [Jupyter Notebook](https://jupyter.org/) 4.2.3+ (earlier versions are
+ untested but may work)
+ - [OpenSSL](https://www.openssl.org/) 1.0.1+ (used to hash the Jupyter
+ Notebook server password)
+ - [Apache Spark](https://spark.apache.org/) 2.1.0+
+
+ ### Open OnDemand
+
+ <!-- TODO: Specify the minimum OOD version this app has been tested with -->
+ - Slurm scheduler
+
+ ### Optional
+
+ - Python module matching your Conda environment version
- ## Install
+ ## App Installation
+
+ ### 1. Clone the repository
- Use Git to clone this app and checkout the desired branch/version you want to
- use:
```sh
- scl enable git19 -- git clone <repo>
- cd <dir>
- scl enable git19 -- git checkout <tag/branch>
+ cd /var/www/ood/apps/sys
+ git clone https://github.com/OSC/bc_osc_jupyter_spark.git
+ cd bc_osc_jupyter_spark
+
+ # Pin to a release (recommended)
+ git checkout v0.16.0
```
- You will not need to do anything beyond this as all necessary assets are
- installed. You will also not need to restart this app as it isn't a Passenger
- app.
-
- To update the app you would:
-
- ```sh
- cd <dir>
- scl enable git19 -- git fetch
- scl enable git19 -- git checkout <tag/branch>
- ```
+ No restart is needed -- Batch Connect apps are not Passenger apps and are
+ detected automatically.
+
+ ### 2. Configure for your site
+
+ Edit `form.yml` and update these values for your cluster:
+
+ | Attribute | OSC Default | Change to |
+ |--------------------------|------------------------------|----------------------------------|
+ | `cluster` | `pitzer`, `cardinal`, `ascend` | Your cluster name(s) |
+ | `auto_modules_spark` | auto-detected Spark modules | Spark modules on your system |
+ | `auto_modules_python` | auto-detected Python modules | Python modules on your system |
+ | `auto_modules_app_jupyter` | auto-detected Jupyter modules | Jupyter modules on your system |
+ | `node_type` | `any`, `hugemem` | Node types available on your cluster |
+
+ In `script.sh.erb`, the app loads modules with:
+ ```
+ module load project/ondemand <app_jupyter_module>
+ module load <python_module> <spark_module>
+ ```
+ Ensure equivalent modules are available on your system.
+
+ ### 3. Update the app
+
+ ```sh
+ cd /var/www/ood/apps/sys/bc_osc_jupyter_spark
+ git fetch
+ git checkout <tag>
+ ```
- Again, you do not need to restart the app as it isn't a Passenger app.
+ No restart is needed.
+
+ ## Configuration
+
+ ### form.yml attributes
+
+ | Attribute | Widget | Description | Default |
+ |----------------------------|-----------------|------------------------------------------------------------|------------------|
+ | `cluster` | select | Target cluster ID(s) | `pitzer`, `cardinal`, `ascend` |
+ | `jupyterlab_switch` | check_box | Use JupyterLab instead of Jupyter Notebook | unchecked |
+ | `working_dir` | path_selector | Root directory for the Jupyter session | `$HOME` |
+ | `auto_modules_spark` | auto-select | Apache Spark module to load | auto-detected |
+ | `auto_modules_python` | auto-select | Python module to load (match your Conda environment) | auto-detected |
+ | `auto_modules_app_jupyter` | auto-select | JupyterLab version module | auto-detected |
+ | `bc_num_hours` | number | Maximum wall time (hours) | <!-- TODO: specify default --> |
+ | `bc_num_slots` | number | Number of nodes for the Spark cluster | `1`+ |
+ | `node_type` | select | Compute node type (any, hugemem) | `any` |
+ | `num_workers` | number_field | Number of Spark workers per node | `1` |
+ | `spark_configuration_file` | path_selector | Override Spark defaults with a custom config file | empty |
+ | `supplement_env_file` | path_selector | Load additional environment variables before Spark startup | empty |
+ | `only_driver_on_root` | check_box | Only launch the driver on the master node | unchecked |
+ | `include_tutorials` | check_box | Include access to OSC tutorial/workshop notebooks | unchecked |
## Contributing
1. Fork it ( https://github.com/OSC/bc_osc_jupyter_spark/fork )
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create a new Pull Request
+ For bugs or feature requests,
+ [open an issue](https://github.com/OSC/bc_osc_jupyter_spark/issues).
+
+ ## Troubleshooting
+
+ <!-- TODO: Add troubleshooting tips you've encountered -->
+
+ ## Testing
+
+ <!-- TODO: Update with sites where this app has been deployed -->
+
+ | Site | OOD Version | Scheduler | Status |
+ |---------------------------|----------------|-----------|------------|
+ | Ohio Supercomputer Center | <!-- TODO --> | Slurm | Production |
+
+ ## Known Limitations
+
+ <!-- TODO: Document any known limitations -->
+
+ ## References
+
+ - [Jupyter](https://jupyter.org/) -- interactive computing environment
+ - [Apache Spark](https://spark.apache.org/) -- unified analytics engine for large-scale data processing
+ - [Open OnDemand](https://openondemand.org/) -- the HPC portal framework
+ - [OOD Batch Connect app development docs](https://osc.github.io/ood-documentation/latest/app-development.html)
+ - [Changelog](https://github.com/OSC/bc_osc_jupyter_spark/blob/master/CHANGELOG.md)
+ -- release history for this app
## License
* Documentation, website content, and logo is licensed under
[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)
* Code is licensed under MIT (see LICENSE.txt)
* The Jupyter logo is a trademark of NumFOCUS foundation.
* Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation (ASF).
+ ## Acknowledgments
+
+ <!-- TODO: Add funding or institutional support information -->
Clean README.md -- copy-paste ready
# Batch Connect - OSC Jupyter Notebook + Spark

[](https://opensource.org/licenses/MIT)
## Overview
An [Open OnDemand](https://openondemand.org/) Batch Connect app that launches
a [Jupyter](https://jupyter.org/) Notebook (or Lab) server together with an
[Apache Spark](https://spark.apache.org/) cluster as an interactive session on
OSC HPC clusters. The app creates a PySpark kernel so users can run Spark jobs
directly from Jupyter notebooks.
This app uses the Batch Connect `basic` template with Slurm and supports
clusters: Pitzer, Cardinal, and Ascend.
- **Upstream projects:** [Jupyter](https://jupyter.org/), [Apache Spark](https://spark.apache.org/)
- **Batch Connect template:** `basic`
- **Scheduler:** Slurm
## Screenshots
<!-- TODO: Add a screenshot of the app's launch form or a running session -->
## Features
- Launches a Jupyter server with a built-in PySpark kernel connected to a Spark cluster
- Toggle between JupyterLab and Jupyter Notebook via checkbox
- Multi-cluster support (Pitzer, Cardinal, Ascend)
- Multi-node Spark clusters with configurable worker count per node
- Hugemem node type support for memory-intensive workloads
- Custom Spark configuration file support (override defaults)
- Supplementary environment variables file support
- Option to restrict driver process to master node only (for large `.collect`/`.take` operations)
- Optional OSC tutorial/workshop notebooks
- Configurable root directory for the Jupyter session
- Module-based software loading via Lmod (`project/ondemand`, `app_jupyter/`, Spark, Python)
## Requirements
### Compute Node Software
This Batch Connect app requires the following software be installed on the
**compute nodes** that the batch job is intended to run on (**NOT** the
OnDemand node):
- [Lmod](https://www.tacc.utexas.edu/research-development/tacc-projects/lmod)
6.0.1+ or any other `module purge` and `module load <modules>` based CLI
- [Jupyter Notebook](https://jupyter.org/) 4.2.3+ (earlier versions are
untested but may work)
- [OpenSSL](https://www.openssl.org/) 1.0.1+ (used to hash the Jupyter
Notebook server password)
- [Apache Spark](https://spark.apache.org/) 2.1.0+
### Open OnDemand
<!-- TODO: Specify the minimum OOD version this app has been tested with -->
- Slurm scheduler
### Optional
- Python module matching your Conda environment version
## App Installation
### 1. Clone the repository
```sh
cd /var/www/ood/apps/sys
git clone https://github.com/OSC/bc_osc_jupyter_spark.git
cd bc_osc_jupyter_spark
# Pin to a release (recommended)
git checkout v0.16.0
```
No restart is needed -- Batch Connect apps are not Passenger apps and are
detected automatically.
### 2. Configure for your site
Edit `form.yml` and update these values for your cluster:
| Attribute | OSC Default | Change to |
|--------------------------|------------------------------|----------------------------------|
| `cluster` | `pitzer`, `cardinal`, `ascend` | Your cluster name(s) |
| `auto_modules_spark` | auto-detected Spark modules | Spark modules on your system |
| `auto_modules_python` | auto-detected Python modules | Python modules on your system |
| `auto_modules_app_jupyter` | auto-detected Jupyter modules | Jupyter modules on your system |
| `node_type` | `any`, `hugemem` | Node types available on your cluster |
In `script.sh.erb`, the app loads modules with:
```
module load project/ondemand <app_jupyter_module>
module load <python_module> <spark_module>
```
Ensure equivalent modules are available on your system.
### 3. Update the app
```sh
cd /var/www/ood/apps/sys/bc_osc_jupyter_spark
git fetch
git checkout <tag>
```
No restart is needed.
## Configuration
### form.yml attributes
| Attribute | Widget | Description | Default |
|----------------------------|-----------------|------------------------------------------------------------|------------------|
| `cluster` | select | Target cluster ID(s) | `pitzer`, `cardinal`, `ascend` |
| `jupyterlab_switch` | check_box | Use JupyterLab instead of Jupyter Notebook | unchecked |
| `working_dir` | path_selector | Root directory for the Jupyter session | `$HOME` |
| `auto_modules_spark` | auto-select | Apache Spark module to load | auto-detected |
| `auto_modules_python` | auto-select | Python module to load (match your Conda environment) | auto-detected |
| `auto_modules_app_jupyter` | auto-select | JupyterLab version module | auto-detected |
| `bc_num_hours` | number | Maximum wall time (hours) | <!-- TODO: specify default --> |
| `bc_num_slots` | number | Number of nodes for the Spark cluster | `1`+ |
| `node_type` | select | Compute node type (any, hugemem) | `any` |
| `num_workers` | number_field | Number of Spark workers per node | `1` |
| `spark_configuration_file` | path_selector | Override Spark defaults with a custom config file | empty |
| `supplement_env_file` | path_selector | Load additional environment variables before Spark startup | empty |
| `only_driver_on_root` | check_box | Only launch the driver on the master node | unchecked |
| `include_tutorials` | check_box | Include access to OSC tutorial/workshop notebooks | unchecked |
## Contributing
1. Fork it ( https://github.com/OSC/bc_osc_jupyter_spark/fork )
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create a new Pull Request
For bugs or feature requests,
[open an issue](https://github.com/OSC/bc_osc_jupyter_spark/issues).
## Troubleshooting
<!-- TODO: Add troubleshooting tips you've encountered -->
## Testing
<!-- TODO: Update with sites where this app has been deployed -->
| Site | OOD Version | Scheduler | Status |
|---------------------------|----------------|-----------|------------|
| Ohio Supercomputer Center | <!-- TODO --> | Slurm | Production |
## Known Limitations
<!-- TODO: Document any known limitations -->
## References
- [Jupyter](https://jupyter.org/) -- interactive computing environment
- [Apache Spark](https://spark.apache.org/) -- unified analytics engine for large-scale data processing
- [Open OnDemand](https://openondemand.org/) -- the HPC portal framework
- [OOD Batch Connect app development docs](https://osc.github.io/ood-documentation/latest/app-development.html)
- [Changelog](https://github.com/OSC/bc_osc_jupyter_spark/blob/master/CHANGELOG.md)
-- release history for this app
## License
* Documentation, website content, and logo is licensed under
[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)
* Code is licensed under MIT (see LICENSE.txt)
* The Jupyter logo is a trademark of NumFOCUS foundation.
* Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation (ASF).
## Acknowledgments
<!-- TODO: Add funding or institutional support information -->
Feel free to use as much or as little of this as you'd like -- we're happy to discuss any of these suggestions or adjust them to better fit your project.
This review is part of the OOD Appverse Affinity Group documentation effort. If you're interested in collaborating on documentation standards for OOD apps, consider joining the Appverse Affinity Group.
Hi! As part of the OOD Appverse community, we're working to improve documentation consistency across Open OnDemand apps so that deployers at other sites can more easily evaluate, install, and adapt them.
We've put together a README template that covers the key sections deployers typically need when considering an app for their site.
After reviewing your current README, here's what we found:
Sections to add (not currently in your README):
form.ymlattributes table)Sections that could be expanded:
basictemplate), Spark cluster support, JupyterLab/Notebook toggle, and link to the upstream Jupyter and Apache Spark projectsv0.16.0), addform.ymlconfiguration guidance for deployers at other sites, and include site-specific customization stepsSections already present:
Below we've provided two versions: a diff showing exactly what we're suggesting to add or change, and a clean copy-paste version you can drop in directly. Lines marked with
<!-- TODO -->need your input -- we deliberately left those rather than guessing.Diff view -- see exactly what's new and changed
Clean README.md -- copy-paste ready
Feel free to use as much or as little of this as you'd like -- we're happy to discuss any of these suggestions or adjust them to better fit your project.
This review is part of the OOD Appverse Affinity Group documentation effort. If you're interested in collaborating on documentation standards for OOD apps, consider joining the Appverse Affinity Group.