API polish: recommended defaults, structured notes, and trust metadata for policy comparisons#41
Conversation
|
Готовый prompt для Codex на полную прическу tutorial. Perform a focused cleanup and redesign pass for Important context:
Main goals:
The notebook should read like a stable product tutorial, not a release note.
Try to reduce repeated sections and repeated explanations.
Acceptable approaches:
The final notebook should not spam repeated lines like
Reduce emphasis on low-level manual estimator plumbing unless it is pedagogically necessary.
Keep this concise and practical.
Non-goals:
Deliverables:
Nice-to-have direction: Если хочется сделать задачу ещё уже, можно сначала ограничиться только:
|
|
Новый prompt для правильной переработки tutorial. Акцент не на косметике, а на учебной ясности: method-vs-oracle, валидность применения методов, доверие к оценкам и high-level API как основной путь. Redesign Important context:
Main goals:
Use a stable product-style tone.
The notebook should not feel like a low-level estimator plumbing demo first and an official API demo second.
Because the notebook uses synthetic data, compute oracle truth and then show a single clear table comparing all implemented estimators on that same dataset. At minimum include rows for:
At minimum include columns like:
The goal is that a reader can immediately see:
For example:
Keep this practical and compact.
Use the current structured outputs of the library where possible:
This section should be short but very clear.
The main educational value should come from:
Validation expectations:
Non-goals:
Deliverables:
Important quality bar:
Если хочется сделать задачу поэтапно, сначала можно выполнить только:
|
|
Operational prompt and execution plan for the next Codex pass after merge to Can Codex do this?Yes — Codex can implement almost all of the next plan. Best use of Codex here:
What still benefits from human review after Codex:
Recommended way to use Codex hereDo not ask for everything in one giant pass. Recommended sequence:
Pre-work before running CodexMinimal pre-work only:
No special manual prep is required beyond that. Prompt for Codex — Pass 1 + 2 combined but still focusedRework the user-facing learning materials of the Important context:
Main goals:
Recommended target structure:
You may keep
This notebook should emphasize:
The notebook must include a minimal copy-adapt-run code path for a user’s own DataFrame.
This notebook should:
If useful, include more than one synthetic scenario, for example:
Keep this notebook pedagogical, not benchmark-heavy.
This guide should answer questions like:
Keep the tone practical, responsible, and clear.
The goal is not perfect theoretical completeness, but actionable user guidance.
If script-like files remain in
Non-goals:
Deliverables:
Quality bar:
Operational advice for the Codex runAsk Codex to:
After Codex finishes:
|
|
Updated plan after reviewing additional external feedback. Short verdict:
What is truly relevant now (must-fix / near-term)
What is relevant but should be delayed (later stage)
Updated implementation planPhase 1 — correctness and portability fixes
Phase 2 — define the canonical user-facing path
Phase 3 — restructure teaching materialsCreate / reorganize materials by user intent:
Phase 4 — optional second wave
Practical prioritizationFor the next Codex pass, the best order is:
This keeps the repo from becoming overloaded while still addressing the most user-visible and credibility-critical issues. |
Motivation
Description
RECOMMENDED_ESTIMATOR,RECOMMENDED_PROPENSITY_SOURCE_WITH_LOGGED,RECOMMENDED_PROPENSITY_SOURCE_FALLBACK, andRECOMMENDED_CROSSFIT_ESTIMATORS, and includedrecommended_defaultsinPolicyComparisonSummary(src/policyscope/comparison.py).PolicyComparisonSummary:info_notes,diagnostic_warnings,inference_warnings,trust_notes,trust_level, and optionalrecommendation, while preserving legacynotesas an additive, backward-compatible combined view (src/policyscope/comparison.py)._build_trust_metadata(...)that summarizes diagnostics + inference warnings intotrust_level/recommendationand nudges cross-fit guidance for estimators where helpful (src/policyscope/comparison.py).decision_summaryto render CI level fromalpharather than hard-coded "95%" and to surfacetrust_level/recommendationwhen present (src/policyscope/report.py).trust_levelinto the validation harness outputs so experiment aggregates can include trust metadata (src/policyscope/validation.py).README.md,docs/architecture.md,docs/validation_harness.md).tests/test_comparison.py,tests/test_bootstrap_report.py).Testing
PYTHONPATH=src pytest -q tests/test_comparison.py tests/test_bootstrap_report.py tests/test_docs_consistency.py tests/test_validation.py.29 passed).notesfield remains available for backward compatibility.Codex Task