-
Notifications
You must be signed in to change notification settings - Fork 173
feat: implement async scheduling admission control #661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 17 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
591d803
docs: add async scheduling epic UML reference
eric-tramel c03bedc
docs: align async scheduling issue map
eric-tramel 7125435
docs: base async scheduling artifacts in plans
eric-tramel 0552243
docs: clarify async scheduler control ownership
eric-tramel 451c1ff
docs: align task admission class flow
eric-tramel f617101
Update plans/645/async-scheduling-epic.puml
eric-tramel 844bfbb
Update plans/645/async-scheduling-epic.puml
eric-tramel ad7bb87
docs: add async scheduling source-of-truth plan
eric-tramel 59dbb47
docs: tighten async scheduling plan contracts
eric-tramel d40e27c
feat: implement async scheduling admission control
eric-tramel 8d1f2e1
refactor async scheduling module ownership
eric-tramel ea3d4a0
improve async scheduler idle observability
eric-tramel 9a8312a
fix request pressure domain metadata (#679)
andreatgretel 2c4379a
fix: harden async scheduling admission follow-ups (#680)
nabinchha 387d5a5
fix request waiter deadline admission (#681)
andreatgretel 0ee8f6d
fix: keep async idle benchmark artifacts in scratch (#683)
andreatgretel 0d2050b
test: pin custom generator error boundary (#684)
andreatgretel 70974fd
fix: request admission edge cases (#685)
nabinchha 05852e4
fix: tighten request controller release semantics (#682)
andreatgretel 1fcc6b5
chore: remove generated benchmark artifacts
eric-tramel d5edf05
chore: remove local benchmark scripts
eric-tramel 87a2143
chore: restore historical devnote posts
eric-tramel 74eb432
chore: restore fern image assets
eric-tramel bc83ae3
chore: restore latest devnotes index
eric-tramel 50770a6
docs: add async scheduling epic plan (#658)
eric-tramel 538891c
Merge remote-tracking branch 'origin/epic/645-async-scheduling' into β¦
eric-tramel 9dd92ab
Add request admission tuning config
eric-tramel c3a39bf
Address Greptile admission telemetry feedback
eric-tramel c97002a
Address async scheduling review follow-ups
eric-tramel da6fa56
Address review follow-up migration gaps
eric-tramel aaa7ead
Add deprecated throttle config shim
eric-tramel 14673cd
Rename request admission success window knob
eric-tramel a06036f
Preserve multi-model plugin alias scheduling
eric-tramel b62406b
Preserve row drops for generator built-in failures
eric-tramel c2522e3
Validate async full-column generator results
eric-tramel afde63b
Merge origin/main into scheduling-yolo
eric-tramel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| # 645 live bench nano AIMD scale lane | ||
|
|
||
| Ran in `/Users/etramel/src/DataDesigner` on branch `scheduling-yolo` with only `/tmp/dd_live_aimd_bench.py` and artifacts under this directory. No tracked repo files were edited. | ||
|
|
||
| ## Configuration | ||
|
|
||
| - Model: `openai/openai/gpt-5-nano` | ||
| - Provider: `nvidia-internal` | ||
| - Temperature: omitted | ||
| - `skip_health_check=True` | ||
| - `DATA_DESIGNER_ASYNC_ENGINE=1` | ||
| - `DATA_DESIGNER_ASYNC_TRACE=1` | ||
| - `max_parallel_requests=16` | ||
| - AIMD initial limit: 1 | ||
| - AIMD `increase_after_successes=16` | ||
| - Shape for final scenarios: 512 rows x 2 independent `LLMTextColumnConfig` columns = 1024 model generations | ||
| - Event instrumentation: `InMemoryAdmissionEventSink`, patched request admission init, model-client factory `request_event_sink`, and scheduler init `scheduler_event_sink` | ||
|
|
||
| ## Scenarios | ||
|
|
||
| | Scenario | Buffer | Requests | Success | Failures | Wall s | Time to cap s | Max in-flight | Max waiters | Request wait p50 / p95 / max s | | ||
| |---|---:|---:|---:|---:|---:|---:|---:|---:|---:| | ||
| | diagnostic-4x2-buffer32 | 32 | 8 | 8 | 0 | 5.586 | n/a | 2 | 7 | 2.638 / 3.871 / 3.887 | | ||
| | final-512x2-buffer32 | 32 | 1024 | 1024 | 0 | 104.823 | 50.332 | 16 | 63 | 3.180 / 19.371 / 29.990 | | ||
| | final-512x2-buffer512 | 512 | 1024 | 1024 | 0 | 114.522 | 57.062 | 16 | 63 | 3.283 / 22.949 / 35.324 | | ||
|
|
||
| ## Observations | ||
|
|
||
| - Both final scenarios completed exactly 1024 `model_request_started` and 1024 `model_request_completed` events, with 0 failed model requests and no fallback model. | ||
| - AIMD limit increased monotonically from 1 through 16 in both final scenarios. There were 15 `request_limit_increased` events, 0 decreases, and 0 rate-limit events in each final scenario. | ||
| - Cap enforcement held: observed request in-flight max was 16 in both final scenarios, matching `max_parallel_requests=16`. | ||
| - `buffer_size=32` reached cap faster (50.332s) and completed faster (104.823s) than `buffer_size=512` (57.062s to cap, 114.522s wall). | ||
| - Request wait p95 was lower for `buffer_size=32` (19.371s) than `buffer_size=512` (22.949s). | ||
| - Traffic became steady after the initial AIMD ramp in both final scenarios; see each `flow_buckets.json` for per-second starts/completions and `monitor_samples.jsonl` for sampled pressure snapshots. | ||
|
|
||
| ## Artifacts | ||
|
|
||
| Each scenario directory contains: | ||
|
|
||
| - `timeline.jsonl` | ||
| - `request_events.jsonl` | ||
| - `monitor_samples.jsonl` | ||
| - `task_traces.csv` | ||
| - `task_traces.json` | ||
| - `flow_buckets.json` | ||
| - `summary.json` | ||
|
|
||
| Combined summary: `combined_summary.json`. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.