Skip to content

feat: environment-scoped release secrets (dev/prod R2 isolation)#157

Merged
ethenotethan merged 3 commits into
swift-providerfrom
fix/swift-release-test-soft-fail
May 13, 2026
Merged

feat: environment-scoped release secrets (dev/prod R2 isolation)#157
ethenotethan merged 3 commits into
swift-providerfrom
fix/swift-release-test-soft-fail

Conversation

@ethenotethan
Copy link
Copy Markdown
Contributor

@ethenotethan ethenotethan commented May 13, 2026

Summary

  • DEV_/PROD_ prefixed repo secrets for R2 + coordinator isolation. Both release-swift.yml and release-rust-bridge.yml resolve the right prefixed secrets in a resolve-env step using bash indirection — no GitHub environments needed.
  • Soft-fail Swift tests on dev releases: live MLX model cache may be incomplete on CI. Prod should re-enable set -euo pipefail.
  • Full model download: removed --include filter on hf download so config.json and all model files land in the HuggingFace cache.

Required Repo Secrets

Add these prefixed secrets to the GitHub repo (Settings → Secrets → Actions):

Secret Dev Value Prod Value
DEV_R2_ACCESS_KEY_ID Dev R2 token ID
PROD_R2_ACCESS_KEY_ID Prod R2 token ID
DEV_R2_SECRET_ACCESS_KEY Dev R2 token secret
PROD_R2_SECRET_ACCESS_KEY Prod R2 token secret
DEV_R2_ENDPOINT https://<acct>.r2.cloudflarestorage.com
PROD_R2_ENDPOINT same or different
DEV_R2_BUCKET d-inf-app-dev
PROD_R2_BUCKET d-inf-app-prod
DEV_R2_PUBLIC_URL Dev CDN URL
PROD_R2_PUBLIC_URL Prod CDN URL
DEV_COORDINATOR_URL https://api.dev.darkbloom.xyz
PROD_COORDINATOR_URL https://api.darkbloom.dev
DEV_RELEASE_KEY Dev release key
PROD_RELEASE_KEY Prod release key

Apple signing secrets (APPLE_CERTIFICATE_P12, APPLE_CERTIFICATE_PASSWORD, APPLE_ID, APPLE_APP_PASSWORD) stay as-is — shared across both envs.

Test Plan

  1. Add DEV_* repo secrets
  2. Push v0.5.0-dev.2 tag → verify dev release uploads to d-inf-app-dev bucket
  3. Verify GET /v1/releases/latest on dev coordinator returns the new release

…e isolation

- Move R2_BUCKET from vars to secrets so it participates in GitHub
  environment scoping (dev vs prod get different buckets/credentials)
- Add documentation header listing all environment-scoped secrets
  required per environment
- Soft-fail Swift unit tests on dev releases (live MLX model cache
  may be incomplete on CI)
- Download full model (remove --include filter) for deterministic
  CI cache seeding
@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

Deployment failed with the following error:

You don't have permission to create a Preview Deployment for this Vercel project: d-inference.

View Documentation: https://vercel.com/docs/accounts/team-members-and-roles

@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
d-inference-console-ui-dev Ready Ready Preview May 13, 2026 8:56am

Request Review

…tion

Both release workflows now resolve DEV_ or PROD_ prefixed repo secrets
in a resolve-env step using bash indirection — no GitHub environments
needed. The environment: gate is removed since secrets live at repo
level with prefixes.

Required repo secrets:
  DEV_R2_ACCESS_KEY_ID, PROD_R2_ACCESS_KEY_ID
  DEV_R2_SECRET_ACCESS_KEY, PROD_R2_SECRET_ACCESS_KEY
  DEV_R2_ENDPOINT, PROD_R2_ENDPOINT
  DEV_R2_BUCKET, PROD_R2_BUCKET
  DEV_R2_PUBLIC_URL, PROD_R2_PUBLIC_URL
  DEV_COORDINATOR_URL, PROD_COORDINATOR_URL
  DEV_RELEASE_KEY, PROD_RELEASE_KEY
@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

Deployment failed with the following error:

You don't have permission to create a Preview Deployment for this Vercel project: d-inference-landing.

View Documentation: https://vercel.com/docs/accounts/team-members-and-roles

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Benchmark Results

Runner: macos-15 (M1 Virtual) | Date: 2026-05-13 09:03 UTC

1-provider-streaming

1 providers, 1 users, 30 requests, concurrency=5, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 15.832s
Throughput 1.9 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 1.424s 732ms 5.334s 10.557s
parse 30 53µs 15µs 333µs 389µs
reserve 30 2ms 1ms 5ms 6ms
route 30 709ms 379ms 1.098s 10.535s
queue_wait 17 1.251s 774ms 10.535s 10.535s
encrypt 30 148µs 140µs 197µs 198µs
dispatch 30 23µs 18µs 48µs 53µs
coordinator_to_provider 30 711ms 2ms 5.321s 5.321s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=52.633µs (threshold=1ms)
parse:p95<=5ms PASS p95=333µs (threshold=5ms)
reserve:mean<=50ms PASS mean=1.502833ms (threshold=50ms)
reserve:p95<=200ms PASS p95=5.117ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=148.3µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=197µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=22.7µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=48µs (threshold=50ms)

1-provider-non-streaming

1 providers, 1 users, 20 requests, concurrency=5, streaming=false

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 20
Success 20
Errors 0
Total Duration 7.514s
Throughput 2.7 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 20 1.791s 898ms 5.294s 5.294s
parse 20 25µs 17µs 92µs 92µs
reserve 20 10ms 2ms 39ms 39ms
route 20 372ms 0s 4.725s 4.725s
queue_wait 7 1.062s 474ms 4.725s 4.725s
encrypt 20 1ms 0s 20ms 20ms
dispatch 20 33µs 23µs 98µs 98µs
coordinator_to_provider 20 817ms 4ms 4.063s 4.063s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=24.7µs (threshold=1ms)
parse:p95<=5ms PASS p95=92µs (threshold=5ms)
reserve:mean<=50ms PASS mean=9.5389ms (threshold=50ms)
reserve:p95<=200ms PASS p95=39.064ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=1.14445ms (threshold=5ms)
encrypt:p95<=50ms PASS p95=20.186ms (threshold=50ms)
dispatch:mean<=5ms PASS mean=33.35µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=98µs (threshold=50ms)

7-provider-multi-model

7 providers, 5 users, 50 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 4 0.5 GB
mlx-community/gemma-3-270m-4bit 3 0.2 GB
Metric Value
Total Requests 50
Success 50
Errors 0
Total Duration 51.256s
Throughput 1.0 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 50 5.259s 409ms 28.188s 28.227s
parse 50 39µs 27µs 99µs 201µs
reserve 50 10ms 2ms 50ms 68ms
route 50 1.384s 0s 10.004s 20.044s
queue_wait 10 1.913s 2.447s 2.811s 2.811s
encrypt 50 214µs 151µs 524µs 668µs
dispatch 50 62µs 38µs 242µs 440µs
coordinator_to_provider 50 3.858s 14ms 28.122s 28.201s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=39.38µs (threshold=1ms)
parse:p95<=5ms PASS p95=99µs (threshold=5ms)
reserve:mean<=50ms PASS mean=9.70728ms (threshold=50ms)
reserve:p95<=200ms PASS p95=50.379ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=213.9µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=524µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=61.7µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=242µs (threshold=50ms)

3-provider-high-concurrency

3 providers, 10 users, 60 requests, concurrency=20, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 60
Errors 0
Total Duration 12.366s
Throughput 4.9 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 60 2.614s 918ms 8.701s 8.734s
parse 60 57µs 39µs 210µs 395µs
reserve 60 10ms 3ms 34ms 51ms
route 60 1.504s 779ms 8.586s 8.618s
queue_wait 44 2.051s 876ms 8.586s 8.618s
encrypt 60 204µs 152µs 550µs 633µs
dispatch 60 48µs 24µs 162µs 496µs
coordinator_to_provider 60 1.075s 16ms 5.428s 5.441s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=56.716µs (threshold=1ms)
parse:p95<=5ms PASS p95=210µs (threshold=5ms)
reserve:mean<=50ms PASS mean=10.442066ms (threshold=50ms)
reserve:p95<=200ms PASS p95=33.701ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=204µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=550µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=47.983µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=162µs (threshold=50ms)

1-provider-queue-saturation

1 providers, 10 users, 40 requests, concurrency=15, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 40
Success 40
Errors 0
Total Duration 14.567s
Throughput 2.7 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 40 4.285s 3.351s 8.128s 8.609s
parse 40 67µs 44µs 256µs 596µs
reserve 40 8ms 7ms 18ms 20ms
route 40 3.751s 3.29s 8.082s 8.537s
queue_wait 34 4.413s 3.333s 8.083s 8.538s
encrypt 40 183µs 159µs 423µs 584µs
dispatch 40 60µs 28µs 542µs 589µs
coordinator_to_provider 40 514ms 2ms 5.08s 5.08s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=67.25µs (threshold=1ms)
parse:p95<=5ms PASS p95=256µs (threshold=5ms)
reserve:mean<=50ms PASS mean=7.65475ms (threshold=50ms)
reserve:p95<=200ms PASS p95=17.986ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=182.575µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=423µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=60.025µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=542µs (threshold=50ms)

3-provider-20-users

3 providers, 20 users, 60 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 60
Success 60
Errors 0
Total Duration 11.76s
Throughput 5.1 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 60 853ms 12ms 5.066s 5.066s
parse 60 44µs 30µs 117µs 371µs
reserve 60 5ms 2ms 22ms 24ms
route 60 50µs 23µs 167µs 609µs
encrypt 60 203µs 139µs 483µs 529µs
dispatch 60 41µs 30µs 115µs 157µs
coordinator_to_provider 60 842ms 6ms 5.023s 5.038s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=43.533µs (threshold=1ms)
parse:p95<=5ms PASS p95=117µs (threshold=5ms)
reserve:mean<=50ms PASS mean=5.245133ms (threshold=50ms)
reserve:p95<=200ms PASS p95=21.64ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=203.2µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=483µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=40.533µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=115µs (threshold=50ms)

1-provider-scaling

1 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 1 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 9.392s
Throughput 3.2 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 2.344s 1.237s 5.775s 5.778s
parse 30 35µs 28µs 106µs 109µs
reserve 30 4ms 1ms 14ms 16ms
route 30 1.721s 1.01s 5.739s 5.739s
queue_wait 25 2.065s 1.013s 5.739s 5.739s
encrypt 30 177µs 142µs 417µs 428µs
dispatch 30 38µs 26µs 115µs 124µs
coordinator_to_provider 30 612ms 5ms 4.566s 4.566s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=34.9µs (threshold=1ms)
parse:p95<=5ms PASS p95=106µs (threshold=5ms)
reserve:mean<=50ms PASS mean=4.173533ms (threshold=50ms)
reserve:p95<=200ms PASS p95=14.129ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=176.666µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=417µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=37.8µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=115µs (threshold=50ms)

3-provider-scaling

3 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 8.073s
Throughput 3.7 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 1.665s 14ms 4.994s 4.994s
parse 30 0s 0s 0s 3ms
reserve 30 6ms 3ms 25ms 26ms
route 30 41µs 18µs 176µs 198µs
encrypt 30 0s 0s 0s 2ms
dispatch 30 27µs 24µs 43µs 50µs
coordinator_to_provider 30 1.648s 9ms 4.962s 4.962s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=151.3µs (threshold=1ms)
parse:p95<=5ms PASS p95=478µs (threshold=5ms)
reserve:mean<=50ms PASS mean=6.481333ms (threshold=50ms)
reserve:p95<=200ms PASS p95=24.69ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=192.766µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=222µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=26.5µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=43µs (threshold=50ms)

5-provider-scaling

5 providers, 5 users, 30 requests, concurrency=10, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 5 0.5 GB
Metric Value
Total Requests 30
Success 30
Errors 0
Total Duration 17.679s
Throughput 1.7 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 30 4.019s 11ms 12.23s 12.232s
parse 30 56µs 38µs 150µs 157µs
reserve 30 9ms 4ms 31ms 37ms
route 30 1.667s 0s 10.003s 10.003s
encrypt 30 172µs 146µs 392µs 559µs
dispatch 30 57µs 42µs 135µs 194µs
coordinator_to_provider 30 2.333s 4ms 12.131s 12.188s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=55.933µs (threshold=1ms)
parse:p95<=5ms PASS p95=150µs (threshold=5ms)
reserve:mean<=50ms PASS mean=9.268733ms (threshold=50ms)
reserve:p95<=200ms PASS p95=30.9ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=171.6µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=392µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=56.866µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=135µs (threshold=50ms)

3-provider-heavy-100conc-10kb

3 providers, 20 users, 100 requests, concurrency=100, streaming=true

Model Providers RAM
mlx-community/Qwen3.5-0.8B-MLX-4bit 3 0.5 GB
Metric Value
Total Requests 100
Success 100
Errors 0
Total Duration 15.493s
Throughput 6.5 req/s

Latency Decomposition

Segment Count Mean P50 P95 Max
total_e2e 100 10.36s 10.691s 14.749s 14.995s
parse 100 149µs 118µs 292µs 893µs
reserve 100 41ms 46ms 53ms 57ms
route 100 9.6s 10.587s 14.638s 14.879s
queue_wait 88 10.91s 10.69s 14.638s 14.879s
encrypt 100 307µs 231µs 675µs 981µs
dispatch 100 77µs 59µs 161µs 274µs
coordinator_to_provider 100 669ms 11ms 5.462s 5.518s

Assertion Report: PASS

Assertion Result Detail
parse:mean<=1ms PASS mean=149.35µs (threshold=1ms)
parse:p95<=5ms PASS p95=292µs (threshold=5ms)
reserve:mean<=50ms PASS mean=40.88414ms (threshold=50ms)
reserve:p95<=200ms PASS p95=52.75ms (threshold=200ms)
encrypt:mean<=5ms PASS mean=307.45µs (threshold=5ms)
encrypt:p95<=50ms PASS p95=675µs (threshold=50ms)
dispatch:mean<=5ms PASS mean=77.12µs (threshold=5ms)
dispatch:p95<=50ms PASS p95=161µs (threshold=50ms)

@ethenotethan ethenotethan merged commit 9185aca into swift-provider May 13, 2026
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant