-
Notifications
You must be signed in to change notification settings - Fork 47
A/B Testing #480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
JasonMoho
wants to merge
36
commits into
main
Choose a base branch
from
dev-ab-testing-pool
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
A/B Testing #480
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
fc46481
add champion/challenger A/B testing pool
JasonMoho 55fc892
fix AB UX: real-time streaming, normal message layout, tie voting, re…
JasonMoho e2c94c7
fix ab testing deployment bugs
JasonMoho c2678fd
fix streaming: emit thinking_start for empty AI chunks
JasonMoho 323433c
fix A/B streaming: flatten tool/thinking events for JS client
JasonMoho 7633ec0
extract shared PipelineEventFormatter for streaming events
JasonMoho b6694ca
eliminate code redundancies across app.py and chat.js (Phase 8)
JasonMoho d63cd79
fix A/B streaming: pre-warm models and add keep_alive for parallel ex…
JasonMoho 48059e8
remove dead A/B code: stale ChatWrapper.create_ab_comparison (wrong 7…
JasonMoho d234402
gate A/B testing behind admin: backend 403 + frontend toggle hidden f…
JasonMoho 050e7f9
add 'Create variant' button to clone agents with different tool configs
JasonMoho 6420f70
add pool editor UI for A/B testing
JasonMoho 9f5042a
add quick variant creation, AB badges, and duplicate prevention
JasonMoho 8c756f0
cleanup: remove prewarm, deduplicate A/B methods via conv_service, cu…
JasonMoho 59b11c2
Merge remote-tracking branch 'origin/main' into dev-ab-testing-pool
JasonMoho f4dd773
remove docs not belonging to this PR
JasonMoho 1950644
add Playwright E2E tests for A/B testing (33 tests)
JasonMoho d1de658
fix ab testing UX: feedback buttons, timers, variant labels, collapse…
JasonMoho 56e8053
make service_chat.py resilient to missing deploy-time config keys
JasonMoho 8610164
Merge branch 'main' into dev-ab-testing-pool
pmlugato 2490364
continued dev, catch up, bug fixes for ab, restart, UI/UX
pmlugato f95a3a1
adding admin ab testing configuration page
pmlugato ad2f270
updates, fixes
pmlugato 555b818
remove temp basic admin role used for tests, update ui+unit tests
pmlugato 578fee7
pass unit and playwright
pmlugato 30af401
unit and playwright pt 2
pmlugato 5b4c876
playWRONG >:(
pmlugato 918cbe2
added RBAC, store ab specs in postgres, per user ab sample rate slider
pmlugato cb4ea9f
fixes and improvements
pmlugato f35b257
unit and playwright tests
pmlugato 36a03af
remaining test failures fixed
pmlugato dabe308
playwright updates
pmlugato 9fb333a
more playwright
pmlugato a5e3931
persist UI A/B changes across restarts, better naming, timing fixes, …
pmlugato f88a549
unit and playwright
pmlugato 15c5c9e
last playwright
pmlugato File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| # Submit76 deployment config | ||
| # Deploy with: | ||
| # ./configs/submit76/deploy-submit76.sh --live | ||
| # | ||
| # Target: mohoney@submit76.mit.edu | ||
| # Ollama must be running on submit76 at localhost:7870 with gpt-oss:120b pulled. | ||
|
|
||
| name: my_archi | ||
|
|
||
| services: | ||
| chat_app: | ||
| agent_class: CMSCompOpsAgent | ||
| agents_dir: examples/agents | ||
| default_provider: local | ||
| default_model: "qwen3:32b" | ||
| providers: | ||
| local: | ||
| enabled: true | ||
| base_url: http://localhost:7870 | ||
| mode: ollama | ||
| default_model: "qwen3:32b" | ||
| models: | ||
| - "gpt-oss:120b" | ||
| - "qwen3:32b" | ||
| port: 7865 | ||
| external_port: 7865 | ||
| ab_testing: | ||
| enabled: true | ||
| pool: | ||
| champion: default | ||
| variants: | ||
| - name: default | ||
| provider: local | ||
| model: "qwen3:32b" | ||
| - name: gpt-oss-120b | ||
| provider: local | ||
| model: "gpt-oss:120b" | ||
| postgres: | ||
| port: 5435 | ||
| data_manager: | ||
| port: 7878 | ||
| external_port: 7878 | ||
| auth: | ||
| enabled: true | ||
|
|
||
| data_manager: | ||
| sources: | ||
| jira: | ||
| enabled: true | ||
| max_tickets: 10 | ||
| url: https://its.cern.ch/jira/ | ||
| projects: | ||
| - "CMSPROD" | ||
| links: | ||
| input_lists: | ||
| - /home/submit/pmlugato/random_configs/lists/sso_git.list | ||
| redmine: | ||
| url: https://cleo.mit.edu | ||
| project: emails-to-ticket | ||
| projects: | ||
| - emails-to-ticket | ||
| embedding_name: HuggingFaceEmbeddings |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.