Skip to content

Commit 3c84e79

Browse files
committed
doc: add documentation of the roberta self-test
1 parent f2fbfd2 commit 3c84e79

File tree

1 file changed

+72
-0
lines changed

1 file changed

+72
-0
lines changed

docs/self-test/roberta/README.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Run a RoBERTa Self-test
2+
3+
You are here because you want to run RoBERTa in your cluster in an
4+
automated fashion. There is currently support for periodic runs, using
5+
a Kubernetes
6+
[CronJob](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/). For
7+
testing purposes, you may also choose to do a one-off run, as well.
8+
9+
## How it Works
10+
11+
The CronJob launches a Kubernetes
12+
[Job](https://kubernetes.io/docs/concepts/workloads/controllers/job/),
13+
which in turn spawns one or more Kubernetes
14+
[Pods](https://kubernetes.io/docs/concepts/workloads/pods/). This pod
15+
wakes up and launches the CodeFlare CLI against a given _profile_
16+
(e.g. one that runs RoBERTa using 1 GPU, 8Gi of worker memory, using a
17+
certain branch of the code base, etc.).
18+
19+
## Prerequisites
20+
21+
If your RoBERTa branch is private, you will need to provide a GitHub
22+
username and token, by creating a Kubernetes
23+
[Secret](https://kubernetes.io/docs/concepts/configuration/secret/). This
24+
secret must be named `github`, and must have two data fields. It
25+
should look something like:
26+
27+
```yaml
28+
apiVersion: v1
29+
data:
30+
GITHUB_USER: ... # output of `echo -n myusername | base64`
31+
GITHUB_TOKEN: ... # output of `echo -n $GITHUB_TOKEN | base64`
32+
kind: Secret
33+
metadata:
34+
name: github
35+
type: Opaque
36+
```
37+
38+
## How to Deploy the Automation
39+
40+
### Create a `github` secret, if needed
41+
42+
```shell
43+
export GITHUB_USER=myusername
44+
```
45+
46+
```shell
47+
export GITHUB_TOKEN=mygithubtoken
48+
```
49+
50+
```shell
51+
kubectl create secret github \
52+
--from-literal=GITHUB_USER=$(echo -n $GITHUB_USER | base64) \
53+
--from-literal=GITHUB_TOKEN=$(echo -n $GITHUB_TOKEN | base64)
54+
```
55+
56+
### Deploy the automation
57+
58+
```shell
59+
kubectl apply -f https://github.com/project-codeflare/codeflare-cli/blob/main/deploy/self-test/roberta/1gpu/periodic.yaml
60+
```
61+
62+
Use `delete` in place of `apply` if you wish to tear down the
63+
automation. Replace `periodic.yaml` with `once.yaml` if you want a
64+
one-off run.
65+
66+
## TODO
67+
68+
- [ ] Implement and document how to point to a specific branch of the
69+
RoBERTa code base. This will probably require the use of a
70+
configmap.
71+
72+
- [ ] Add Slack integration to inform a team of self-test failures.

0 commit comments

Comments
 (0)