You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(swe): improve dataset generation pipeline with validation and progress monitoring (#13)
* refactor(model): replace openai/gpt-5.2-codex:nitro with moonshotai/kimi-k2.5:nitro
* feat(swe): improve dataset quality, speed, and validation pipeline
Overhaul the synthetic dataset generator to produce higher-quality
benchmarks faster and with better validation:
Quality improvements:
- Add min_description_length filter (30 chars) to reject PRs with empty
or very short descriptions, preventing blank/useless benchmark tasks
- Strip repository names, PR numbers, and GitHub URLs from generated
prompts via post-processing in prompt_rewriter to avoid leaking
project identity into benchmark tasks
- Add test script validation (validate_test_scripts) that checks shell
scripts have shebang lines, are non-empty, and that referenced test
files actually exist in the submission set — with retry loop support
Observability:
- Add new progress module with ProgressCounters (shared atomics) and
ProgressMonitor (background tokio task) that logs pipeline stats
(filtered/extracted/scored/accepted) every 30s with ETA percentage
- Wire progress monitor into SweOrchestrator::run lifecycle
Build config:
- Simplify .cargo/config.toml linker to use cc instead of clang+mold
for broader compatibility
0 commit comments