Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
4508fc7
Create integration-workflow.yml
kaylawilding Oct 11, 2025
f77fdd9
update inference pipeline name & python version
Oct 31, 2025
61908e2
adding two integration tests - one weekly that will be a health check…
Nov 1, 2025
a4c05b6
removing tags
Nov 1, 2025
6eaa4b3
Merge branch 'develop' into feature/ci_cd_github_actions
vishpillai123 Nov 1, 2025
6af5b97
underscores not dashes
Nov 1, 2025
4650e34
testing weekly integration test
Nov 3, 2025
8c07473
correcting syntax
Nov 3, 2025
0f7b13f
ok
Nov 4, 2025
b911c0d
refactoring auth
Nov 4, 2025
bc357f8
databricks.yml include paths
Nov 4, 2025
afd1210
debug paths
Nov 4, 2025
84a6016
setting working directory
Nov 4, 2025
cdf5577
uh what's happening
Nov 4, 2025
a59dea5
removing cohort file name
Nov 4, 2025
e2c7aec
adding git_ref in deployment
Nov 4, 2025
7fbd201
changing sp run as to ds run as
Nov 4, 2025
ab9613c
removing pipeline sa email... that's not a thing
Nov 4, 2025
2f9f201
removing other unnecessary vars
Nov 4, 2025
9b5a97f
debugging auth issues
Nov 4, 2025
46ef8b2
looking at dev_host or dev_token
Nov 4, 2025
5cdc0ab
testing preflight
Nov 4, 2025
eeb0f04
workiing on adding env secrets to github
Nov 6, 2025
1d1c20f
using client ID and client secret
Nov 6, 2025
68f3f50
testing auth preflight for logging
Nov 6, 2025
afd3672
trying again
Nov 6, 2025
81dc61a
preflight
Nov 6, 2025
bd1f302
ok
Nov 6, 2025
f4d0667
preflight ok
Nov 6, 2025
e130e75
ok
Nov 6, 2025
35dd4be
wait do we need a preflight?
Nov 6, 2025
4960f70
changing run commands
Nov 7, 2025
d976f88
setting working-directory
Nov 7, 2025
7b9d7e0
dev host and dev client id and client secret
Nov 7, 2025
946f2c0
fixing bundle run
Nov 7, 2025
321332f
umm trying to attach git ref to a commit
Nov 7, 2025
a84fb9b
trying to override git source during dev
Nov 7, 2025
9451da9
ok
Nov 7, 2025
b3b5ead
trying to specify git source directly in the training & inference
Nov 7, 2025
c8ea2a5
trying to remove git ref
Nov 7, 2025
c26942c
trying different things
Nov 7, 2025
f385e03
setting empty string defaults -> maybe this works?
Nov 7, 2025
2c0026d
fixing model name and rerunning inference'
Nov 7, 2025
fa86a42
defining the model..
Nov 7, 2025
167e1f6
ok adding the model name and model type under params and not var
Nov 7, 2025
6d0130c
ok for some reason it doesn't like params spread on multiple lines
Nov 7, 2025
51d5e15
indents
Nov 7, 2025
3932020
spaces.. syntax ugh
Nov 7, 2025
f2df4bc
adding dk not email and dk cc email
Nov 7, 2025
7c3c48d
defining DK CC email
Nov 7, 2025
6917999
trying this again
Nov 7, 2025
2249e83
EMAIL instead of email
Nov 7, 2025
98087d1
correcting run_id
Nov 7, 2025
a0fdb93
adding weekly cleanup on synthetic_integration
Nov 7, 2025
8dfa8a9
need to set up cluster ugh
Nov 7, 2025
696d96d
lint/style
Nov 7, 2025
3a9186f
updating actions with synthetic-integration
Nov 7, 2025
f138e76
testing if i can deploy create the cleanup job
Nov 7, 2025
f056f42
trying again
Nov 7, 2025
83cdbf0
trying again
Nov 7, 2025
fa4dc9d
adding git_url
Nov 7, 2025
d19752a
let's try running cleanup
Nov 7, 2025
39eecad
adding checkout schedule
Nov 8, 2025
789ce94
adding group_to_manage to deploy&run
Nov 10, 2025
7a88d8a
indents
Nov 10, 2025
3088d93
adding variables from bundle into run
Nov 10, 2025
eb07df1
creating cluster similar to how we're doing for training/inference
Nov 10, 2025
4b40eb1
new pipeline cluster name
Nov 10, 2025
82eb96d
adding file properly in .yml
Nov 10, 2025
d2ce722
ok let's try actually deleting
Nov 10, 2025
19eb0cd
changing recursive -> recurse
Nov 10, 2025
95944a8
adding model deletion
Nov 10, 2025
c904dee
trying this again
Nov 10, 2025
570fc4c
deleting unnecessary args
Nov 10, 2025
7aae3d7
attempting cleanup again
Nov 10, 2025
38c1e8a
re-running pipeline
Nov 10, 2025
18ffd09
ok let's try a dry run
Nov 10, 2025
d27145e
can we delete the experiment'
Nov 11, 2025
58cb50b
checking if retention days works
Nov 11, 2025
5680020
testing metadata update
Nov 11, 2025
55d8f7e
ok
Nov 11, 2025
71fa55d
ok updating metadata
Nov 11, 2025
9b18cb3
test: release integration
Nov 11, 2025
40b309a
test: retrying release integration
Nov 11, 2025
6488333
feat: adding dev_prod so that we can use dev_sst_02 and prod as a target
Nov 11, 2025
ade047c
fix: trying to force dev_prod to bundle in the right place
Nov 11, 2025
4b13b48
fix: removing hardcoded ids. we'll just accept deploying from dev_pro…
Nov 11, 2025
054d4a3
fix: db prefix name needs underscore not dash
Nov 11, 2025
edd4c4c
fix: release integration test works, reverting back to trigger on rel…
Nov 11, 2025
847cc8d
style
Nov 11, 2025
81ecec4
type check
Nov 12, 2025
3b21870
style
Nov 12, 2025
999a69e
testing release integration with just using dev
Nov 12, 2025
e23d437
test: release integration
Nov 12, 2025
a299ce4
test: release integration
Nov 12, 2025
47ec1a3
test: deploy-main on dev-sst-02
Nov 12, 2025
ecebbd4
test: deploy-main on dev-sst-02
Nov 13, 2025
3520965
test: deploy-main on dev-sst-02
Nov 13, 2025
2462e01
test: deploy-main on dev-sst-02
Nov 13, 2025
603e043
test: deploy-main on dev-sst-02
Nov 13, 2025
694d1bd
test: deploy-main on dev-sst-02
Nov 13, 2025
039ee37
test: deploy-main on dev-sst-02
Nov 13, 2025
3cfc7d0
test: deploy-main on dev-sst-02
Nov 13, 2025
e765fae
test: deploy-main on dev-sst-02
Nov 13, 2025
db5a4e7
test: adding host client id and client secret in env
Nov 13, 2025
b85209b
test: deploying on dev first
Nov 13, 2025
d6e547f
test: deploy on dev
Nov 13, 2025
07a868e
test: deploy on prod
Nov 13, 2025
8af7bc6
test: dev
Nov 13, 2025
c6fae92
test: prod
Nov 13, 2025
7dd4068
test: attempting to update permissions to see if this works
Nov 13, 2025
584b1bc
fix: syntax
Nov 13, 2025
7603940
test: dev
Nov 13, 2025
f9267cf
test: dev
Nov 13, 2025
2999644
changing to 'CAN_MANAGE' instead
Nov 13, 2025
47f11ae
test: prod
Nov 13, 2025
6d834cd
test: printing out auth
Nov 13, 2025
71bac76
fix: reverting back to original state with deploy-main.. prod deploym…
Nov 13, 2025
4833eea
style
Nov 13, 2025
6de35f4
test: prod
Nov 13, 2025
550bbc2
fix: syntax
Nov 13, 2025
fea09e1
test: prod
Nov 13, 2025
5b96e65
test: checking bundle summary
Nov 13, 2025
a461ca2
test: prod
Nov 13, 2025
7e074f5
chore: uncommenting from test, ready to review
Nov 13, 2025
bb1c62e
fix: style
Nov 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 140 additions & 0 deletions .github/workflows/deploy-main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
name: deploy-main

on:
# deploy only when version tags are pushed
push:
tags:
- 'v*'

workflow_dispatch:
inputs:
ref:
description: 'Git ref (tag/branch/commit) to deploy'
required: false
default: ''

concurrency:
group: deploy-prod-${{ github.ref_name }}-${{ github.job }}
cancel-in-progress: false

jobs:
# PROD (staging-sst-01)
deploy-staging:
if: ${{ startsWith(github.ref, 'refs/tags/v') || github.event_name == 'workflow_dispatch' }}
name: Deploy DAB to staging_sst_01 instance
runs-on: ubuntu-latest
timeout-minutes: 60
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.ref || github.ref }}

- uses: actions/setup-python@v5
with:
python-version: '3.11.11'

- name: Install uv and Databricks CLI
run: |
python -m pip install --upgrade pip
pip install uv
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
databricks -v
shell: bash

- name: Install project dependencies (uv)
run: |
uv venv
uv pip install .

- name: Configure Databricks env (Dev)
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_DEV_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_DEV_CLIENT_SECRET }}
run: |
databricks version

- name: Deploy bundle (prod target -> staging_sst_01)
working-directory: pipelines/pdp

env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_DEV_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_DEV_CLIENT_SECRET }}
DB_WORKSPACE: ${{ secrets.STAGING_DB_WORKSPACE }}
SA_EXECUTER: ${{ secrets.STAGING_SERVICE_ACCOUNT_EXECUTER }}
DS_RUN_AS: ${{ secrets.STAGING_DS_RUN_AS }}
GROUP_TO_MANAGE: ${{ secrets.GROUP_TO_MANAGE }}
DATAKIND_EMAIL: ${{ secrets.DATAKIND_EMAIL }}
run: |
echo "Deploying DAB to prod for tag $GITHUB_REF_NAME..."
databricks bundle validate --target=prod
databricks bundle deploy \
--target=prod \
--var="DB_workspace=$DB_WORKSPACE" \
--var="service_account_executer=$SA_EXECUTER" \
--var="ds_run_as=$DS_RUN_AS" \
--var="databricks_institution_name=midway_uni" \
--var="datakind_group_to_manage_workflow=$GROUP_TO_MANAGE" \
--var="datakind_notification_email=$DATAKIND_EMAIL" \
--var="git_tag=${GITHUB_REF_NAME}"

# DEV (dev-sst-02)
deploy-dev:
if: ${{ startsWith(github.ref, 'refs/tags/v')

name: Deploy DAB to dev_sst_02 instance
runs-on: ubuntu-latest
timeout-minutes: 60
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.ref || github.ref }}

- uses: actions/setup-python@v5
with:
python-version: '3.11.11'

- name: Install uv and Databricks CLI
run: |
python -m pip install --upgrade pip
pip install uv
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
databricks -v
shell: bash

- name: Install project dependencies (uv)
run: |
uv venv
uv pip install .

- name: Configure Databricks env (Dev)
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_DEV_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_DEV_CLIENT_SECRET }}
run: |
databricks version

- name: Deploy bundle (prod target -> dev_sst_02)
working-directory: pipelines/pdp
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_DEV_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_DEV_CLIENT_SECRET }}
DB_WORKSPACE: secrets.DEV_DB_WORKSPACE
SA_EXECUTER: ${{ secrets.DEV_SERVICE_ACCOUNT_EXECUTER }}
DS_RUN_AS: ${{ secrets.DEV_DS_RUN_AS }}
GROUP_TO_MANAGE: ${{ secrets.GROUP_TO_MANAGE }}
DATAKIND_EMAIL: ${{ secrets.DATAKIND_EMAIL }}
run: |
echo "Deploying PROD target to dev_sst_02 for tag ${GITHUB_REF_NAME}..."
databricks bundle deploy \
--target=prod \
--var="DB_workspace=$DB_WORKSPACE" \
--var="service_account_executer=$SA_EXECUTER" \
--var="ds_run_as=$DS_RUN_AS" \
--var="databricks_institution_name=synthetic" \
--var="datakind_group_to_manage_workflow=$GROUP_TO_MANAGE" \
--var="datakind_notification_email=$DATAKIND_EMAIL" \
--var="git_tag=${GITHUB_REF_NAME}" \
108 changes: 108 additions & 0 deletions .github/workflows/release-integration.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
name: release-branch-ci-dev

on:
# run when release branch is created from git flow
push:
branches:
- 'release/*'

concurrency:
group: release-branch-ci-dev-${{ github.ref }}
cancel-in-progress: false

jobs:
dev-train-infer:
name: Train + Inference on Dev (release branch)
runs-on: ubuntu-latest
timeout-minutes: 120
env:
CONFIG_FILE: config.toml
DB_RUN_ID_PREFIX: release_${{ github.sha }}

steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }

- uses: actions/setup-python@v5
with: { python-version: "3.11.11" }

- name: Install uv and Databricks CLI
run: |
python -m pip install --upgrade pip
pip install uv
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
databricks -v
shell: bash

- name: Install project dependencies (uv)
run: |
uv venv
uv pip install .
shell: bash

- name: Configure Databricks env (Dev)
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_DEV_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_DEV_CLIENT_SECRET }}
run: |
databricks version

- name: Deploy bundle to Dev (pin to current commit)
working-directory: pipelines/pdp
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_DEV_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_DEV_CLIENT_SECRET }}
DB_WORKSPACE: ${{ secrets.DEV_DB_WORKSPACE }}
SA_EXECUTER: ${{ secrets.DEV_SERVICE_ACCOUNT_EXECUTER }}
DS_RUN_AS: ${{ secrets.DEV_DS_RUN_AS }}
GROUP_TO_MANAGE: ${{ secrets.GROUP_TO_MANAGE }}
DATAKIND_EMAIL: ${{ secrets.DATAKIND_EMAIL }}
run: |
databricks bundle deploy \
--target=dev \
--var "git_commit=${GITHUB_SHA}" \
--var="DB_workspace=$DB_WORKSPACE" \
--var="service_account_executer=$SA_EXECUTER" \
--var="ds_run_as=$DS_RUN_AS" \
--var "databricks_institution_name=synthetic_integration" \
--var="datakind_group_to_manage_workflow=$GROUP_TO_MANAGE" \
--var="datakind_notification_email=$DATAKIND_EMAIL" \

- name: TRAIN (Dev – develop-release-check)
working-directory: pipelines/pdp
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_DEV_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_DEV_CLIENT_SECRET }}
run: |
databricks bundle run github_sourced_pdp_training_pipeline --target dev \
--var "git_commit=${GITHUB_SHA}" \
--var "DB_workspace=${{ secrets.DEV_DB_WORKSPACE }}" \
--var "service_account_executer=${{ secrets.DEV_SERVICE_ACCOUNT_EXECUTER }}" \
--var "ds_run_as=${{ secrets.DEV_DS_RUN_AS }}" \
--var "datakind_group_to_manage_workflow=${{ secrets.GROUP_TO_MANAGE }}" \
--var "databricks_institution_name=synthetic_integration" \
--var "datakind_notification_email=${{ secrets.DATAKIND_EMAIL }}" \
--var "DK_CC_EMAIL=${{ secrets.DATAKIND_EMAIL }}" \
--params config_file_name="$CONFIG_FILE",job_type=training,db_run_id="${DB_RUN_ID_PREFIX}_train"

- name: INFER (Dev – develop-release-check)
working-directory: pipelines/pdp
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_DEV_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_DEV_CLIENT_SECRET }}
run: |
databricks bundle run edvise_github_sourced_pdp_inference_pipeline --target dev \
--var "git_commit=${GITHUB_SHA}" \
--var "DB_workspace=${{ secrets.DEV_DB_WORKSPACE }}" \
--var "service_account_executer=${{ secrets.DEV_SERVICE_ACCOUNT_EXECUTER }}" \
--var "ds_run_as=${{ secrets.DEV_DS_RUN_AS }}" \
--var "datakind_group_to_manage_workflow=${{ secrets.GROUP_TO_MANAGE }}" \
--var "databricks_institution_name=synthetic_integration" \
--var "datakind_notification_email=${{ secrets.DATAKIND_EMAIL }}" \
--var "DK_CC_EMAIL=${{ secrets.DATAKIND_EMAIL }}" \
--params "model_name=synthetic_integration_retention_2_year_time_first_within_cohort,model_type=h2o,config_file_name=${CONFIG_FILE},job_type=inference,db_run_id=${DB_RUN_ID_PREFIX}_inference"

62 changes: 62 additions & 0 deletions .github/workflows/weekly-cleanup.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
name: weekly-cleanup

on:
schedule:
- cron: "10 16 1-7 * 1" # First Monday of every month at 16:10 UTC
workflow_dispatch: {}

jobs:
cleanup-synthetic:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'schedule' && 'develop' || github.ref_name }}
fetch-depth: 0

- uses: actions/setup-python@v5
with: { python-version: "3.11.11" }

- name: Install uv and Databricks CLI
run: |
python -m pip install --upgrade pip
pip install uv
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

- name: Install project deps
run: |
uv venv
uv pip install .

- name: Deploy bundle (dev)
working-directory: pipelines/pdp
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_DEV_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_DEV_CLIENT_SECRET }}
run: |
databricks bundle deploy --target dev \
--var "git_commit=${GITHUB_SHA}" \
--var "DB_workspace=${{ secrets.DEV_DB_WORKSPACE }}" \
--var "databricks_institution_name=synthetic_integration" \
--var "datakind_notification_email=${{ secrets.DATAKIND_EMAIL }}" \
--var "service_account_executer=${{ secrets.DEV_SERVICE_ACCOUNT_EXECUTER }}" \
--var "ds_run_as=${{ secrets.DEV_DS_RUN_AS }}" \
--var "datakind_group_to_manage_workflow=${{ secrets.GROUP_TO_MANAGE }}"

- name: CLEANUP
working-directory: pipelines/pdp
env:
DATABRICKS_HOST: ${{ secrets.DATABRICKS_DEV_HOST }}
DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_DEV_CLIENT_ID }}
DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_DEV_CLIENT_SECRET }}
run: |
databricks bundle run edvise_cleanup_synthetic --target dev \
--var "git_commit=${GITHUB_SHA}" \
--var "DB_workspace=${{ secrets.DEV_DB_WORKSPACE }}" \
--var "databricks_institution_name=synthetic_integration" \
--var "datakind_notification_email=${{ secrets.DATAKIND_EMAIL }}" \
--var "service_account_executer=${{ secrets.DEV_SERVICE_ACCOUNT_EXECUTER }}" \
--var "ds_run_as=${{ secrets.DEV_DS_RUN_AS }}" \
--var "datakind_group_to_manage_workflow=${{ secrets.GROUP_TO_MANAGE }}" \
--params dry_run=false,delete_models=true,delete_experiments=true,retention_days=30,databricks_institution_name=synthetic_integration
Loading
Loading