Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize documentation #4164

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Docs

This folder contains the documentation for the Armada project. The documentation is written in markdown and is rendered as webpages on [armadaproject.io](https://armadaproject.io).

It's accessible from the IDE, GitHub, and the website.

## For Developers

See [website.md](./developer/website.md)

## Overview

Docs added to this the `docs/` folder are automatically copied into [armadaproject.io](https://armadaproject.io).

For example, if you wanted to document bananas, and you added `bananas.md`,
once committed to master that would be published at
`https://armadaproject.io/bananas/`.

> [!NOTE]
> All files in `docs/` folder are rendered as webpage except this `README.md` file.

## Pages with assets

If you'd like to add a more complex page, such as one with images or other
linked assets, you have to be careful to ensure links will work both
for people viewing in GitHub and for those viewing via [armadaproject.io](https://armadaproject.io).

The easiest way to accomplish this is by using page bundles. Assets should be located inside the `docs/` folder and
used in the markdown file with relative paths.

## Removing pages

Any page that is removed from the `docs/` folder will be removed from the website automatically. The `docs/` folder is
the source of truth for the website's content.
2 changes: 1 addition & 1 deletion docs/consistency.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Armada stores its state across several databases. Whenever Armada receives an AP
There are three commonly used approaches to address this issue:

* Store all state in a single database with support for transactions. Changes are submitted atomically and are rolled back in case of failure; there are no partial failures.
* Distributed transaction frameworks (e.g., X/Open XA), which extend the notation of transactions to operations involving several databases.
* Distributed transaction frameworks (e.g., X/Open XA), which extend the notation of transactions to operations involving several databases.
* Ordered idempotent updates.

The first approach results in tight coupling between components and would limit us to a single database technology. Adding a new component (e.g., a new dashboard) could break existing component since all operations part of the transaction are rolled back if one fails. The second approach allows us to use multiple databases (as long as they support the distributed transaction framework), but components are still tightly coupled since they have to be part of the same transaction. Further, there are performance concerns associated with these options, since transactions may not be easily scalable. Hence, we use the third approach, which we explain next.
Expand Down
144 changes: 144 additions & 0 deletions docs/demo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Armada Demo

<div class="responsive-video">
<iframe width="560" height="315" src="https://www.youtube.com/embed/l76yh1VjhaY" title="Armada demo video" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
</div>

> <small><i>This video demonstrates the use of Armadactl, Armada Lookout UI, and Apache Airflow.</i></small>

This guide will show you how to take a quick test drive of an Armada
instance already deployed to AWS EKS.

## EKS

The Armada UI (lookout) can be found at this URL:

- [https://ui.demo.armadaproject.io](https://ui.demo.armadaproject.io)

## Local prerequisites

- Git
- Go 1.20

## Obtain the armada source
Clone [this](https://github.com/armadaproject/armada) repository:

```bash
git clone https://github.com/armadaproject/armada.git
cd armada
```

All commands are intended to be run from the root of the repository.

## Setup an easy-to-use alias
If you are on a Windows System, use a linux-supported terminal to run this command, for example [Git Bash](https://git-scm.com/downloads) or [Hyper](https://hyper.is/)
```bash
alias armadactl='go run cmd/armadactl/main.go --armadaUrl armada.demo.armadaproject.io:443'
```

## Create queues and jobs
Create queues, submit some jobs, and monitor progress:

### Queue Creation
Use a unique name for the queue. Make sure you remember it for the next steps.
```bash
armadactl create queue $QUEUE_NAME --priorityFactor 1
armadactl create queue $QUEUE_NAME --priorityFactor 2
```

For queues created in this way, user and group owners of the queue have permissions to:
- submit jobs
- cancel jobs
- reprioritize jobs
- watch queue

For more control, queues can be created via `armadactl create`, which allows for setting specific permission; see the following example.

```bash
armadactl create -f ./docs/quickstart/queue-a.yaml
armadactl create -f ./docs/quickstart/queue-b.yaml
```

Make sure to manually edit both of these `yaml` files using a code or text editor before running the commands above.

```
name: $QUEUE_NAME
```

### Job Submission
```
armadactl submit ./docs/quickstart/job-queue-a.yaml
armadactl submit ./docs/quickstart/job-queue-b.yaml
```

Make sure to manually edit both of these `yaml` files using a code or text editor before running the commands above.
```
queue: $QUEUE_NAME
```

### Monitor Job Progress

```bash
armadactl watch $QUEUE_NAME job-set-1
```
```bash
armadactl watch $QUEUE_NAME job-set-1
```

Try submitting lots of jobs and see queues get built and processed:

#### Windows (using Git Bash):

Use a text editor of your choice.
Copy and paste the following lines into the text editor:
```
#!/bin/bash

for i in {1..50}
do
armadactl submit ./docs/quickstart/job-queue-a.yaml
armadactl submit ./docs/quickstart/job-queue-b.yaml
done
```
Save the file with a ".sh" extension (e.g., myscript.sh) in the root directory of the project.
Open Git Bash, navigate to the project's directory using the 'cd' command, and then run the script by typing ./myscript.sh and pressing Enter.

#### Linux:

Open a text editor (e.g., Nano or Vim) in the terminal and create a new file by running: nano myscript.sh (replace "nano" with your preferred text editor if needed).
Copy and paste the script content from above into the text editor.
Save the file and exit the text editor.
Make the script file executable by running: chmod +x myscript.sh.
Run the script by typing ./myscript.sh in the terminal and pressing Enter.

#### macOS:

Follow the same steps as for Linux, as macOS uses the Bash shell by default.
With this approach, you create a shell script file that contains your multi-line script, and you can run it as a whole by executing the script file in the terminal.

## Observing job progress

CLI:

```bash
$ armadactl watch queue-a job-set-1
Watching job set job-set-1
Nov 4 11:43:36 | Queued: 0, Leased: 0, Pending: 0, Running: 0, Succeeded: 0, Failed: 0, Cancelled: 0 | event: *api.JobSubmittedEvent, job id: 01drv3mey2mzmayf50631tzp9m
Nov 4 11:43:36 | Queued: 1, Leased: 0, Pending: 0, Running: 0, Succeeded: 0, Failed: 0, Cancelled: 0 | event: *api.JobQueuedEvent, job id: 01drv3mey2mzmayf50631tzp9m
Nov 4 11:43:36 | Queued: 1, Leased: 0, Pending: 0, Running: 0, Succeeded: 0, Failed: 0, Cancelled: 0 | event: *api.JobSubmittedEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
Nov 4 11:43:36 | Queued: 2, Leased: 0, Pending: 0, Running: 0, Succeeded: 0, Failed: 0, Cancelled: 0 | event: *api.JobQueuedEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
Nov 4 11:43:38 | Queued: 1, Leased: 1, Pending: 0, Running: 0, Succeeded: 0, Failed: 0, Cancelled: 0 | event: *api.JobLeasedEvent, job id: 01drv3mey2mzmayf50631tzp9m
Nov 4 11:43:38 | Queued: 0, Leased: 2, Pending: 0, Running: 0, Succeeded: 0, Failed: 0, Cancelled: 0 | event: *api.JobLeasedEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
Nov 4 11:43:38 | Queued: 0, Leased: 1, Pending: 1, Running: 0, Succeeded: 0, Failed: 0, Cancelled: 0 | event: *api.JobPendingEvent, job id: 01drv3mey2mzmayf50631tzp9m
Nov 4 11:43:38 | Queued: 0, Leased: 0, Pending: 2, Running: 0, Succeeded: 0, Failed: 0, Cancelled: 0 | event: *api.JobPendingEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
Nov 4 11:43:41 | Queued: 0, Leased: 0, Pending: 1, Running: 1, Succeeded: 0, Failed: 0, Cancelled: 0 | event: *api.JobRunningEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
Nov 4 11:43:41 | Queued: 0, Leased: 0, Pending: 0, Running: 2, Succeeded: 0, Failed: 0, Cancelled: 0 | event: *api.JobRunningEvent, job id: 01drv3mey2mzmayf50631tzp9m
Nov 4 11:44:17 | Queued: 0, Leased: 0, Pending: 0, Running: 1, Succeeded: 1, Failed: 0, Cancelled: 0 | event: *api.JobSucceededEvent, job id: 01drv3mf7b6fd1rraeq1f554fn
Nov 4 11:44:26 | Queued: 0, Leased: 0, Pending: 0, Running: 0, Succeeded: 2, Failed: 0, Cancelled: 0 | event: *api.JobSucceededEvent, job id: 01drv3mey2mzmayf50631tzp9m
```

Web UI:

Open [https://ui.demo.armadaproject.io](https://ui.demo.armadaproject.io) in your browser.

![Lookout UI](./quickstart/img/lookout.png "Lookout UI")
76 changes: 76 additions & 0 deletions docs/design/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# System overview

This document is meant to be an overview of Armada for new users. We cover the architecture of Armada, show how jobs are represented, and explain how jobs are queued and scheduled.

If you just want to learn how to submit jobs to Armada, see:

- [User guide](../user.md)

If you want to see a quick overview of Armadas components, see:

- [Relationships diagram](./relationships_diagram.md)

## Architecture

Armada consists of two main components:
- The Armada server, which is responsible for accepting jobs from users and deciding in what order, and on which Kubernetes cluster, jobs should run. Users submit jobs to the Armada server through the `armadactl` command-line utility or via a gRPC or REST API.
- The Armada executor, of which there is one instance running in each Kubernetes cluster that Armada is connected to. Each Armada executor instance regularly notifies the server of how much spare capacity it has available and requests jobs to run. Users of Armada never interact with the executor directly.

All state relating to the Armada server is stored in [Redis](https://redis.io/), which may use replication combined with failover for redundancy. Hence, the Armada server is itself stateless and is easily replicated by running multiple independent instances. Both the server and the executors are intended to be run in Kubernetes pods. We show a diagram of the architecture below.

![How Armada works](./batch-api.svg)

### Job leasing

To avoid jobs being lost if a cluster or its executor becomes unavailable, each job assigned to an executor has an associated timeout. Armada executors are required to check in with the server regularly and if an executor responsible for running a particular job fails to check in within that timeout, the server will re-schedule the job on another cluster.

## Jobs and job sets

A job is the most basic unit of work in Armada, and is represented by a Kubernetes pod specification (podspec) with additional metadata specific to Armada. Armada handles creating, running, and removing containers as necessary for each job. Hence, Armada is essentially a system for managing the life cycle of a set of containerised applications representing a batch job.

The Armada workflow is:

1. Create a job specification, which is a Kubernetes podspec with a few additional metadata fields.
2. Submit the job specification to one of Armada's job queues using the `armadactl` CLI utility or through the Armada gRPC or REST API.

For example, a job that sleeps for 60 seconds could be represented by the following yaml file.

```yaml
queue: test
jobSetId: set1
jobs:
- priority: 0
podSpecs:
- terminationGracePeriodSeconds: 0
restartPolicy: Never
containers:
- name: sleep
imagePullPolicy: IfNotPresent
image: busybox:latest
args:
- sleep
- 60s
resources:
limits:
memory: 64Mi
cpu: 150m
requests:
memory: 64Mi
cpu: 150m
```

In the above yaml snippet, `podSpec` is a Kubernetes podspec, which consists of one or more containers that contain the user code to be run. In addition, the job specification (jobspec) contains metadata fields specific to Armada:

- `queue`: which of the available job queues the job should be submitted to.
- `priority`: the job priority (lower values indicate higher priority).
- `jobSetId`: jobs with the same `jobSetId` can be followed and cancelled in a single operation. The `jobSetId` has no impact on scheduling.

Queues and scheduling is explained in more detail below.

For more examples, see the [user guide](../user.md).

### Job events

A job event is generated whenever the state of a job changes (e.g., when changing from submitted to running or from running to completed) and is a timestamped message containing event-specific information (e.g., an exit code for a completed job). All events generated by jobs part of the same job set are grouped together and published via a [Redis stream](https://redis.io/topics/streams-intro). There are unique streams for each job set to facilitate subscribing only to events generated by jobs in a particular set, which can be done via the Armada API.

Armada records all events necessary to reconstruct the state of each job and, after a job has been completed, the only information retained about the job is the events generated by it.
Loading
Loading