armadaproject · naskio · Jan 15, 2025 · Jan 15, 2025 · Jan 15, 2025 · Jan 15, 2025
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,34 @@
+# Docs
+
+This folder contains the documentation for the Armada project. The documentation is written in markdown and is rendered as webpages on [armadaproject.io](https://armadaproject.io).
+
+It's accessible from the IDE, GitHub, and the website.
+
+## For Developers
+
+See [website.md](./developer/website.md)
+
+## Overview
+
+Docs added to this the `docs/` folder are automatically copied into [armadaproject.io](https://armadaproject.io).
+
+For example, if you wanted to document bananas, and you added `bananas.md`,
+once committed to master that would be published at
+`https://armadaproject.io/bananas/`.
+
+> [!NOTE]  
+> All files in `docs/` folder are rendered as webpage except this `README.md` file.
+
+## Pages with assets
+
+If you'd like to add a more complex page, such as one with images or other
+linked assets, you have to be careful to ensure links will work both
+for people viewing in GitHub and for those viewing via [armadaproject.io](https://armadaproject.io).
+
+The easiest way to accomplish this is by using page bundles. Assets should be located inside the `docs/` folder and
+used in the markdown file with relative paths.
+
+## Removing pages
+
+Any page that is removed from the `docs/` folder will be removed from the website automatically. The `docs/` folder is
+the source of truth for the website's content.
diff --git a/docs/consistency.md b/docs/consistency.md
@@ -7,7 +7,7 @@ Armada stores its state across several databases. Whenever Armada receives an AP
 There are three commonly used approaches to address this issue:
 
 * Store all state in a single database with support for transactions. Changes are submitted atomically and are rolled back in case of failure; there are no partial failures.
-* Distributed transaction frameworks (e.g., X/Open XA), which extend the notation of transactions to operations involving several databases.
+* Distributed transaction frameworks (e.g., X/Open XA), which extend the notation of transactions to operations involving several databases.
 * Ordered idempotent updates.
 
 The first approach results in tight coupling between components and would limit us to a single database technology. Adding a new component (e.g., a new dashboard) could break existing component since all operations part of the transaction are rolled back if one fails. The second approach allows us to use multiple databases (as long as they support the distributed transaction framework), but components are still tightly coupled since they have to be part of the same transaction. Further, there are performance concerns associated with these options, since transactions may not be easily scalable. Hence, we use the third approach, which we explain next.

diff --git a/docs/demo.md b/docs/demo.md
@@ -0,0 +1,144 @@
+# Armada Demo
+
+<div class="responsive-video">
+<iframe width="560" height="315" src="https://www.youtube.com/embed/l76yh1VjhaY" title="Armada demo video" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
+</div>
+
+> <small><i>This video demonstrates the use of Armadactl, Armada Lookout UI, and Apache Airflow.</i></small>
+
+This guide will show you how to take a quick test drive of an Armada
+instance already deployed to AWS EKS.
+
+## EKS
+
+The Armada UI (lookout) can be found at this URL:
+
+- [https://ui.demo.armadaproject.io](https://ui.demo.armadaproject.io)
+
+## Local prerequisites
+
+- Git
+- Go 1.20
+
+## Obtain the armada source
+Clone [this](https://github.com/armadaproject/armada) repository:
+
+```bash
+git clone https://github.com/armadaproject/armada.git
+cd armada
+```
+
+All commands are intended to be run from the root of the repository.
+
+## Setup an easy-to-use alias
+If you are on a Windows System, use a linux-supported terminal to run this command, for example [Git Bash](https://git-scm.com/downloads) or [Hyper](https://hyper.is/)
+```bash
+alias armadactl='go run cmd/armadactl/main.go --armadaUrl armada.demo.armadaproject.io:443'
+```
+
+## Create queues and jobs
+Create queues, submit some jobs, and monitor progress:
+
+### Queue Creation
+Use a unique name for the queue. Make sure you remember it for the next steps.
+```bash
+armadactl create queue $QUEUE_NAME --priorityFactor 1
+armadactl create queue $QUEUE_NAME --priorityFactor 2
+```
+
+For queues created in this way, user and group owners of the queue have permissions to:
+- submit jobs
+- cancel jobs
+- reprioritize jobs
+- watch queue
+
+For more control, queues can be created via `armadactl create`, which allows for setting specific permission; see the following example.
+
+```bash
+armadactl create -f ./docs/quickstart/queue-a.yaml
+armadactl create -f ./docs/quickstart/queue-b.yaml
+```
+
+Make sure to manually edit both of these `yaml` files using a code or text editor before running the commands above.
+
+```
+name: $QUEUE_NAME
+```
+
+### Job Submission
+```
+armadactl submit ./docs/quickstart/job-queue-a.yaml
+armadactl submit ./docs/quickstart/job-queue-b.yaml
+```
+
+Make sure to manually edit both of these `yaml` files using a code or text editor before running the commands above.
+```
+queue: $QUEUE_NAME
+```
+
+### Monitor Job Progress
+
+```bash
+armadactl watch $QUEUE_NAME job-set-1
+```
+```bash
+armadactl watch $QUEUE_NAME job-set-1
+```
+
+Try submitting lots of jobs and see queues get built and processed:
+
+#### Windows (using Git Bash):
+
+Use a text editor of your choice.
+Copy and paste the following lines into the text editor:
+```
+#!/bin/bash
+
+for i in {1..50}
+do
+  armadactl submit ./docs/quickstart/job-queue-a.yaml
+  armadactl submit ./docs/quickstart/job-queue-b.yaml
+done
+```
+Save the file with a ".sh" extension (e.g., myscript.sh) in the root directory of the project.
+Open Git Bash, navigate to the project's directory using the 'cd' command, and then run the script by typing ./myscript.sh and pressing Enter.
+
+#### Linux:
+
+Open a text editor (e.g., Nano or Vim) in the terminal and create a new file by running: nano myscript.sh (replace "nano" with your preferred text editor if needed).
+Copy and paste the script content from above into the text editor.
+Save the file and exit the text editor.
+Make the script file executable by running: chmod +x myscript.sh.
+Run the script by typing ./myscript.sh in the terminal and pressing Enter.
+
+#### macOS:
+
+Follow the same steps as for Linux, as macOS uses the Bash shell by default.
+With this approach, you create a shell script file that contains your multi-line script, and you can run it as a whole by executing the script file in the terminal.
+
+## Observing job progress
+
+CLI:
+
+```bash
+$ armadactl watch queue-a job-set-1
+Watching job set job-set-1
+Nov  4 11:43:36 | Queued:   0, Leased:   0, Pending:   0, Running:   0, Succeeded:   0, Failed:   0, Cancelled:   0 | event: *api.JobSubmittedEvent, job id: 01drv3mey2mzmayf50631tzp9m
+Nov  4 11:43:36 | Queued:   1, Leased:   0, Pending:   0, Running:   0, Succeeded:   0, Failed:   0, Cancelled:   0 | event: *api.JobQueuedEvent, job id: 01drv3mey2mzmayf50631tzp9m
+Nov  4 11:43:36 | Queued:   1, Leased:   0, Pending:   0, Running:   0, Succeeded:   0, Failed:   0, Cancelled:   0 | event: *api.JobSubmittedEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
+Nov  4 11:43:36 | Queued:   2, Leased:   0, Pending:   0, Running:   0, Succeeded:   0, Failed:   0, Cancelled:   0 | event: *api.JobQueuedEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
+Nov  4 11:43:38 | Queued:   1, Leased:   1, Pending:   0, Running:   0, Succeeded:   0, Failed:   0, Cancelled:   0 | event: *api.JobLeasedEvent, job id: 01drv3mey2mzmayf50631tzp9m
+Nov  4 11:43:38 | Queued:   0, Leased:   2, Pending:   0, Running:   0, Succeeded:   0, Failed:   0, Cancelled:   0 | event: *api.JobLeasedEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
+Nov  4 11:43:38 | Queued:   0, Leased:   1, Pending:   1, Running:   0, Succeeded:   0, Failed:   0, Cancelled:   0 | event: *api.JobPendingEvent, job id: 01drv3mey2mzmayf50631tzp9m
+Nov  4 11:43:38 | Queued:   0, Leased:   0, Pending:   2, Running:   0, Succeeded:   0, Failed:   0, Cancelled:   0 | event: *api.JobPendingEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
+Nov  4 11:43:41 | Queued:   0, Leased:   0, Pending:   1, Running:   1, Succeeded:   0, Failed:   0, Cancelled:   0 | event: *api.JobRunningEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
+Nov  4 11:43:41 | Queued:   0, Leased:   0, Pending:   0, Running:   2, Succeeded:   0, Failed:   0, Cancelled:   0 | event: *api.JobRunningEvent, job id: 01drv3mey2mzmayf50631tzp9m
+Nov  4 11:44:17 | Queued:   0, Leased:   0, Pending:   0, Running:   1, Succeeded:   1, Failed:   0, Cancelled:   0 | event: *api.JobSucceededEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
+Nov  4 11:44:26 | Queued:   0, Leased:   0, Pending:   0, Running:   0, Succeeded:   2, Failed:   0, Cancelled:   0 | event: *api.JobSucceededEvent, job id: 01drv3mey2mzmayf50631tzp9m
+```
+
+Web UI:
+
+Open [https://ui.demo.armadaproject.io](https://ui.demo.armadaproject.io) in your browser.
+
+![Lookout UI](./quickstart/img/lookout.png "Lookout UI")
diff --git a/docs/design/README.md b/docs/design/README.md
@@ -0,0 +1,76 @@
+# System overview
+
+This document is meant to be an overview of Armada for new users. We cover the architecture of Armada, show how jobs are represented, and explain how jobs are queued and scheduled.
+
+If you just want to learn how to submit jobs to Armada, see:
+
+- [User guide](../user.md)
+
+If you want to see a quick overview of Armadas components, see:
+
+- [Relationships diagram](./relationships_diagram.md)
+
+## Architecture
+
+Armada consists of two main components:
+- The Armada server, which is responsible for accepting jobs from users and deciding in what order, and on which Kubernetes cluster, jobs should run. Users submit jobs to the Armada server through the `armadactl` command-line utility or via a gRPC or REST API.
+- The Armada executor, of which there is one instance running in each Kubernetes cluster that Armada is connected to. Each Armada executor instance regularly notifies the server of how much spare capacity it has available and requests jobs to run. Users of Armada never interact with the executor directly.
+
+All state relating to the Armada server is stored in [Redis](https://redis.io/), which may use replication combined with failover for redundancy. Hence, the Armada server is itself stateless and is easily replicated by running multiple independent instances. Both the server and the executors are intended to be run in Kubernetes pods. We show a diagram of the architecture below.
+
+![How Armada works](./batch-api.svg)
+
+### Job leasing
+
+To avoid jobs being lost if a cluster or its executor becomes unavailable, each job assigned to an executor has an associated timeout. Armada executors are required to check in with the server regularly and if an executor responsible for running a particular job fails to check in within that timeout, the server will re-schedule the job on another cluster.
+
+## Jobs and job sets
+
+A job is the most basic unit of work in Armada, and is represented by a Kubernetes pod specification (podspec) with additional metadata specific to Armada. Armada handles creating, running, and removing containers as necessary for each job. Hence, Armada is essentially a system for managing the life cycle of a set of containerised applications representing a batch job.
+
+The Armada workflow is:
+
+1. Create a job specification, which is a Kubernetes podspec with a few additional metadata fields.
+2. Submit the job specification to one of Armada's job queues using the `armadactl` CLI utility or through the Armada gRPC or REST API.
+
+For example, a job that sleeps for 60 seconds could be represented by the following yaml file.
+
+```yaml
+queue: test
+jobSetId: set1
+jobs:
+  - priority: 0
+    podSpecs:
+      - terminationGracePeriodSeconds: 0
+        restartPolicy: Never
+        containers:
+          - name: sleep
+            imagePullPolicy: IfNotPresent
+            image: busybox:latest
+            args:
+              - sleep
+              - 60s
+            resources:
+              limits:
+                memory: 64Mi
+                cpu: 150m
+              requests:
+                memory: 64Mi
+                cpu: 150m
+```
+
+In the above yaml snippet, `podSpec` is a Kubernetes podspec, which consists of one or more containers that contain the user code to be run. In addition, the job specification (jobspec) contains metadata fields specific to Armada:
+
+- `queue`: which of the available job queues the job should be submitted to.
+- `priority`: the job priority (lower values indicate higher priority).
+- `jobSetId`: jobs with the same `jobSetId` can be followed and cancelled in a single operation. The `jobSetId` has no impact on scheduling.
+
+Queues and scheduling is explained in more detail below.
+
+For more examples, see the [user guide](../user.md).
+
+### Job events
+
+A job event is generated whenever the state of a job changes (e.g., when changing from submitted to running or from running to completed) and is a timestamped message containing event-specific information (e.g., an exit code for a completed job). All events generated by jobs part of the same job set are grouped together and published via a [Redis stream](https://redis.io/topics/streams-intro). There are unique streams for each job set to facilitate subscribing only to events generated by jobs in a particular set, which can be done via the Armada API.
+
+Armada records all events necessary to reconstruct the state of each job and, after a job has been completed,  the only information retained about the job is the events generated by it.