diff --git a/README.md b/README.md index 5c3a9da1..70b35f7c 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,12 @@ -# Infra-actions documentation +# Infra-actions ## GitHub - - [Using the GitHub actions matrix strategy](./docs/GITHUB_ACTIONS.md) - - [Self-hosted GitHub action runners](./github-runner-provisioner/README.md) -## Clusters - - [Cluster provisioning with custom manifests](./setup-cluster/README.md) -## Dev loop - - [DEVELOPING.md](docs/DEVELOPING.md) + +- [Github Actions for Test Matrices](docs/GITHUB_ACTIONS.md) +- [Custom GitHub action runners](docs/ACTION_RUNNERS.md) +- [Self-hosted GitHub action runners](github-runner-provisioner/README.md) + +## Development + +- [Working with GitHub workflows and actions](docs/DEVELOPING.md) +- [Provision Cluster GitHub Action](provision-cluster/README.md) diff --git a/docs/ACTION_RUNNERS.md b/docs/ACTION_RUNNERS.md index 2f515b0d..875c8f7e 100644 --- a/docs/ACTION_RUNNERS.md +++ b/docs/ACTION_RUNNERS.md @@ -1,21 +1,20 @@ # Custom GitHub action runners -There are self-hosted Mac M1 and Ubuntu ARM64 runners available for GitHub actions. There runners are EC2 instances -hosted in AWS. +There are self-hosted Mac M1 and Ubuntu ARM64 runners available for GitHub actions. There runners are EC2 instances hosted in AWS. In the future, we may make additional runners available depending on the needs of the different teams. ## Repository configuration Before a job can use a self-hosted runner, the following settings need to be configured in the GitHub repository: - + 1. Add the `d6e-automaton` account as a repo administrator (`Repo ⇾ Settings ⇾ Collaborators and teams`) 2. Add a webhook (`Repo ⇾ Settings ⇾ Webhooks`) with the following settings: 1. Payload URL: `https://sw.bakerstreet.io/github-runner-provisioner/` 2. Content type: `application/x-www-form-urlencoded` 3. Secret: Enter the value found in `/keybase/team/datawireio/secrets/github-actions/github-infra-actions` 4. SSL verification: `Enable` - 5. Which events trigger the webhook? `Let me select individual events` ⇾ `Workflow jobs` + 5. Which events trigger the webhook? `Let me select individual events` ⇾ `Workflow jobs` Once the webhook is configured, you can use the runners as described below. @@ -24,33 +23,28 @@ Once the webhook is configured, you can use the runners as described below. There are self-hosted Mac M1 (ARM64) runners that can be used in a workflow by using `runs-on: macOS-arm64`. ```yaml -... jobs: - my_job: - runs-on: macOS-arm64 - steps: - # The provision-cluster action will automatically register a cleanup hook to remove the - # cluster it provisions when the job is done. - - uses: actions/checkout@v3 - ... + my_job: + runs-on: macOS-arm64 + steps: + # The provision-cluster action will automatically register a cleanup hook to remove the + # cluster it provisions when the job is done. + - uses: actions/checkout@v3 ``` The following limitations apply to Mac M1 runners: + - It will take between 30 minutes and up to 3 hours for a runner to be available from the moment it is requested by a job. -- There is a limit of 10 active Mac M1 runners. Any build that requests a Mac M1 during this time will - stay in a queued state until a runner is available. If a job is queued for more than 24 hours, it will be marked as failed. -- Once a Mac M1 runner is created, it will continue to run for up to 24 hours, picking-up oe or more jobs. What the means - is that jobs are responsible for ensuring that runners are in a clean state before they are used. +- There is a limit of 10 active Mac M1 runners. Any build that requests a Mac M1 during this time will stay in a queued state until a runner is available. If a job is queued for more than 24 hours, it will be marked as failed. +- Once a Mac M1 runner is created, it will continue to run for up to 24 hours, picking-up oe or more jobs. What the means is that jobs are responsible for ensuring that runners are in a clean state before they are used. ## Ubuntu ARM64 runners -These self-hosted runners are created on-demand. It takes about a minute for the runner to be available, and once the -job finishes, they are destroyed. +These self-hosted runners are created on-demand. It takes about a minute for the runner to be available, and once the job finishes, they are destroyed. To request one, use label `ubuntu-arm64`: ```yaml -... jobs: my_job: runs-on: ubuntu-arm64 @@ -58,5 +52,4 @@ jobs: # The provision-cluster action will automatically register a cleanup hook to remove the # cluster it provisions when the job is done. - uses: actions/checkout@v3 - ... -``` \ No newline at end of file +``` diff --git a/docs/DEVELOPING.md b/docs/DEVELOPING.md index a6787042..900896ac 100644 --- a/docs/DEVELOPING.md +++ b/docs/DEVELOPING.md @@ -4,13 +4,12 @@ GitHub workflows and any actions used by them can be tested locally using [act]( Once `act` is installed, it can be invoked from the repository root like this: -``` +```shell act pull_request ``` -`act` can pass secrets with the command line option `-s`. For example, to pass a secret called `KUBECEPTION_TOKEN` run it -like this: +`act` can pass secrets with the command line option `-s`. For example, to pass a secret called `KUBECEPTION_TOKEN` run it like this: -``` +```shell act pull_request -s KUBECEPTION_TOKEN=MY_TOKEN ``` diff --git a/docs/GITHUB_ACTIONS.md b/docs/GITHUB_ACTIONS.md index 6cb7116a..c435bcf2 100644 --- a/docs/GITHUB_ACTIONS.md +++ b/docs/GITHUB_ACTIONS.md @@ -1,44 +1,36 @@ # Github Actions for Test Matrices -This repository hosts github actions that can be used to provision and configure kubernetes -clusters. These are intended to facilitate building out a comprehensive [test -matrix](../.github/workflows/matrix.yaml) suitable for use in real-world large scale integration and -compatibility testing for both telepresence and edge-stack. +This repository hosts github actions that can be used to provision and configure kubernetes clusters. These are intended to facilitate building out a comprehensive [test matrix](../.github/workflows/matrix.yaml) suitable for use in real-world large scale integration and compatibility testing for both Telepresence and Edge Stack. -The [matrix workflow](../.github/workflows/matrix.yaml) illustrates an exemplary usage of these -actions. +The [matrix workflow](../.github/workflows/matrix.yaml) illustrates usage of these actions. ## Cluster Provisioning -The [provision-cluster](../provision-cluster/README.md) action can be used to provision different -varieties of clusters: +The [provision-cluster](../provision-cluster/README.md) action can be used to provision different varieties of clusters: - Kubeception (k3s based) - GKE - EKS (unimplemented) - AKS (unimplemented) -By including this github action in your workflow you can easily run the same test suite against any -supported set of clusters: +By including this github action in your workflow you can easily run the same test suite against any supported set of clusters: ```yaml -... jobs: -... my_matrix_job: strategy: matrix: clusters: - - distribution: GKE - version: "1.23" - useAuthProvider: "false" - - distribution: GKE - version: "1.23" - useAuthProvider: "true" - - distribution: AKS - version: "1.22" - - distribution: Kubeception - version: "1.23" + - distribution: GKE + version: "1.23" + useAuthProvider: "false" + - distribution: GKE + version: "1.23" + useAuthProvider: "true" + - distribution: AKS + version: "1.22" + - distribution: Kubeception + version: "1.23" steps: # The provision-cluster action will automatically register a cleanup hook to remove the # cluster it provisions when the job is done. @@ -54,19 +46,14 @@ jobs: useAuthProvider: ${{ matrix.clusters.useAuthProvider }} - run: make tests -... ``` The following inputs apply only to GKE clusters: -`useAuthProvider`: If set to "true", Authentication is done using an authentication provider, like the -[gke-gcloud-auth-plugin](https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke). - +- `useAuthProvider`: If set to "true", Authentication is done using an authentication provider, like the [gke-gcloud-auth-plugin](https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke). The action returns the following outputs: -`clusterName`: Name of the cluster. - -`projectId`: For GKE, the project ID. Undefined for other cluster providers. - -`location`: For GKE, the cluster location (region or zone). Undefined for other cluster providers. \ No newline at end of file +- `clusterName`: Name of the cluster. +- `projectId`: For GKE, the project ID. Undefined for other cluster providers. +- `location`: For GKE, the cluster location (region or zone). Undefined for other cluster providers. diff --git a/github-runner-provisioner/README.md b/github-runner-provisioner/README.md index dfbb5c8c..79d9bd39 100644 --- a/github-runner-provisioner/README.md +++ b/github-runner-provisioner/README.md @@ -1,71 +1,56 @@ -# Runner Service +# Self-hosted GitHub action runners -This service is based on the [echo -template](https://github.com/datawire/infrastructure/tree/master/echo). Please view the -[README](https://github.com/datawire/infrastructure/tree/master/echo) for details about the dev loop -and how it works. +This service is based on the [echo template](https://github.com/datawire/infrastructure/tree/master/echo). Please view the [README](https://github.com/datawire/infrastructure/tree/master/echo) for details about the dev loop and how it works. -# Architecture +## Architecture -We use the GitHub-Runner-Provisioner to serve a webhook to GitHub Actions. GitHub will send any -Actions events to the GRP running in Skunkworks, which will parse those events looking for -workflows that request special labels in their `runs-on` property. +We use the GitHub-Runner-Provisioner to serve a webhook to GitHub Actions. GitHub will send any Actions events to the GRP running in Skunkworks, which will parse those events looking for workflows that request special labels in their `runs-on` property. -Using the GitHub Self-Hosted Runner binaries we then spin up the custom runners in one of our -supported runner providers - currently AWS and CodeMagic. Supported runners are configured in -[runner.go](runner.go). +Using the GitHub Self-Hosted Runner binaries we then spin up the custom runners in one of our supported runner providers - currently AWS and CodeMagic. Supported runners are configured in [runner.go](runner.go). -## AWS +### AWS -AWS runners are created in EC2 using the AWS SDK. See the [aws_runners](internal/aws/runners) -package for details on the implementation. +AWS runners are created in EC2 using the AWS SDK. See the [aws_runners](internal/aws/runners) package for details on the implementation. -## CodeMagic +### CodeMagic -CodeMagic runners are actually CodeMagic Builds (CI jobs in their service) that then pull the -GitHub Self-Hosted binaries and register themselves as ephemeral (single-use) runners - picking -up a single job from the calling repo and then terminating. +CodeMagic runners are actually CodeMagic Builds (CI jobs in their service) that then pull the GitHub Self-Hosted binaries and register themselves as ephemeral (single-use) runners - picking up a single job from the calling repo and then terminating. -# Testing the application +## Testing the application -## Integration Tests +### Integration Tests **Note**: Before running tests, make sure you run the application with environment variable `WEBHOOK_TOKEN=FAKE_TOKEN`. -You will also need to set `GITHUB_TOKEN` to a PAT for the D6E Automaton. These values can all be found in the -[github-runner-provisioner-secrets.yaml](/keybase/team/datawireio/skunkworks/github-runner-provisioner-secrets.yaml) -file in Keybase - you will need to base64 decode them before use. If only running dry-runs only AWS and GitHub -authentication is required. -To test the application we use targets in the Makefile. The `make go-unit-tests` target will run the unit tests, -and `make test-runners` will run the integration tests against the dry-run endpoints. Note that to test the -AWS `macOS-arm64` runner you will need to set the `USE_CODEMAGIC` environment variable to `true` in the GRP. +You will also need to set `GITHUB_TOKEN` to a PAT for the D6E Automaton. These values can all be found in the [github-runner-provisioner-secrets.yaml](/keybase/team/datawireio/skunkworks/github-runner-provisioner-secrets.yaml) file in Keybase - you will need to base64 decode them before use. If only running dry-runs only AWS and GitHub authentication is required. + +To test the application we use targets in the Makefile. The `make go-unit-tests` target will run the unit tests, and `make test-runners` will run the integration tests against the dry-run endpoints. Note that to test the AWS `macOS-arm64` runner you will need to set the `USE_CODEMAGIC` environment variable to `true` in the GRP. + +Testing CodeMagic M1 & AWS ubuntu-arm64: -Testing CodeMagic M1 & AWS ubuntu-arm64: ```bash - USE_CODEMAGIC=true GITHUB_TOKEN= go run main.go --dry-run - make test-runners +USE_CODEMAGIC=true GITHUB_TOKEN= go run main.go --dry-run +make test-runners ``` -**Note**: You can send requests to the production client using `make run-` Be careful when sending -requests to production using an HTTP client, since the `dry-run` -request parameter defaults to true. This is necessary because we have no way to set GitHub to send this -parameter. +**Note**: You can send requests to the production client using `make run-` Be careful when sending requests to production using an HTTP client, since the `dry-run` request parameter defaults to true. This is necessary because we have no way to set GitHub to send this parameter. -## Unit tests +### Unit tests -Some unit tests use mocks generated by gomock. If the interface being mocked is updated, you may have to re-generate the -mocks by running: +Some unit tests use mocks generated by gomock. If the interface being mocked is updated, you may have to re-generate the mocks by running: ```shell -make update-go-mocks +make update-go-mocks ``` -# Env Vars +## Env Vars + The runner provisioner requires the following variables to be configured: -- `GITHUB_TOKEN` - a personal access token with admin access to the repo configuring the runners. -We use the `D6E-Automaton`'s token in production. -- `WEBHOOK_TOKEN` - the secret used to configure the webhook in GitHub. We use the token stored at -`/Keybase/team/datawireio/infra/github-runner-provisioner-secrets` + +- `GITHUB_TOKEN` - a personal access token with admin access to the repo configuring the runners. + We use the `D6E-Automaton`'s token in production. +- `WEBHOOK_TOKEN` - the secret used to configure the webhook in GitHub. We use the token stored at + `/Keybase/team/datawireio/infra/github-runner-provisioner-secrets` - `CODEMAGIC_TOKEN` - the secret used to authenticate to the CodeMagic build API to trigger M1 runners - `USE_CODEMAGIC` - a boolean flag to indicate whether to use CodeMagic or AWS to provision M1 runners -- AWS auth can be configured with `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` or by using the aws cli \ No newline at end of file +- AWS auth can be configured with `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` or by using the aws cli diff --git a/provision-cluster/README.md b/provision-cluster/README.md index 11e06b55..c1728e4e 100644 --- a/provision-cluster/README.md +++ b/provision-cluster/README.md @@ -1,30 +1,27 @@ -# Documentation to enable developing and releasing the items in this repository. +# Provision Cluster GitHub Action -## Releasing the provision-cluster GitHub Action: +## Releasing the provision-cluster GitHub Action -GitHub Actions are released by creating a semver tag and pushing it to GitHub. No additional steps -are needed. +GitHub Actions are released by creating a semver tag and pushing it to GitHub. No additional steps are needed. ### Step 1: Query existing tags -Use `git pull` to make sure you have all tags locally and then use `git tag -l` to find existing tag -names. Release tags are of the form `vX.Y.Z` and release versions should follow semver. +Use `git pull` to make sure you have all tags locally and then use `git tag -l` to find existing tag names. Release tags are of the form `vX.Y.Z` and release versions should follow semver. ### Step 2: Tag with your new version number -Use `git tag vX.Y.Z` to tag with your new version number, and then run `git push --tags` to push the -new tag up to GitHub. +Use `git tag vX.Y.Z` to tag with your new version number, and then run `git push --tags` to push the new tag up to GitHub. -### Step 3: Verify the release works by updating the smoke test workflow. +### Step 3: Verify the release works by updating the smoke test workflow -Once the tag is pushed, then verify the release by using it in the smoke test workflow. Do this by -editing `.github/workflows/smoke.yaml`, search for the uses line and update the version to the newly -released tag, e.g.: +Once the tag is pushed, then verify the release by using it in the smoke test workflow. Do this by editing `.github/workflows/smoke.yaml`, search for the uses line and update the version to the newly released tag. -``` -... - - uses: datawire/infra-actions/provision-cluster@vX.Y.Z -... +```yaml +jobs: + release_smoke: + steps: + - id: provision + uses: datawire/infra-actions/provision-cluster@vX.Y.Z ``` Pushing the tag should trigger the release smoke test workflow. Verify that this has in fact passed. diff --git a/scripts/README.md b/scripts/README.md deleted file mode 100644 index d5e2637c..00000000 --- a/scripts/README.md +++ /dev/null @@ -1,8 +0,0 @@ -# GitHub Actions Time - -A simple python script that pulls data from the GitHub Actions API and does some basic analysis to help -you understand how long your workflows are taking to run. - -Right now it has a single command but this could be expanded if we wanted to know more in the future. - -Run it with Python 3.11, as of now all requirements are in the standard library. \ No newline at end of file diff --git a/setup-cluster/README.md b/setup-cluster/README.md deleted file mode 100644 index f0d8e247..00000000 --- a/setup-cluster/README.md +++ /dev/null @@ -1,52 +0,0 @@ -# Cluster Setup - -Note: This section describes how this feature would work but it's not implemented. - - -Use the `setup-cluster` action as described below: - -```yaml - - uses: ./setup-cluster - with: - # Tells setup-cluster which kubeconfig file to use. - kubeconfig: path/to/kubeconfig.yaml - - # Tells setup-cluster where to find manifests. This can be a URL or a path in the - # filesystem. The manifests can be either raw yaml or kustomize based manifests. - manifests: url-or-path-to-manifests -``` - - -The [setup-cluster](../setup-cluster/README.md) action can be used to configure a cluster with a given -set of manifests required for test execution. The action will not only intelligently apply the -manifests (dealing with any interdependencies), but also ensure that all deployments, statefulsets, -daemonsets, etc, are fully available, ready, and passing their health checks before allowing the job -to proceed: - -```yaml -... -jobs: -... - my_matrix_job: - ... - steps: - - uses: datawire/infra-actions/provision-cluster@v0.2.0 - with: - distribution: ${{ matrix.clusters.distribution }} - version: ${{ matrix.clusters.version }} - # Tells provision-cluster where to write the kubeconfig file. - kubeconfig: kubeconfig.yaml - # For convenience, the provision-cluster manifest will invoke the setup-cluster manifest if you - # pass in a pointer to manifests, however you can also use it independently as shown below just - # in case your cluster does not come from the ./provision-cluster action. - # - # manifests: - - uses: ./setup-cluster - with: - # Tells setup-cluster which kubeconfig file to use. - kubeconfig: kubeconfig.yaml - # The manifests parameter can point to a file or url and can include raw yaml or kustomized manifests. - manifests: https://github.com/datawire/my-repo/manifests.yaml - - run: KUBECONFIG=kubeconifig.yaml make tests -... -```